AI and the ‘Scientific Sleuth’ – transcript

Ideas to Innovation - Season Two

Intro: Ideas to Innovation from Clarivate

Neville Hobson: Maintaining belief in the truthfulness of what you see, read and hear has never been under such a challenge as it is today. While technology plays a huge role in this present state of affairs, most egregious is the human faker, someone who sets out to deceive and profit from such deception with methods that long pre-date today’s sophisticated technology like artificial intelligence or AI. In scientific research, this manifests itself in areas like data and image falsification, and uncovering such dishonesty isn’t easy. When you do, challenging it can be tricky as well.

Welcome to Ideas to Innovation, a podcast from Clarivate with information and insight from conversations that explore how innovation spurs incredible outcomes by passionate people in many areas of science, business, academia, technology, sport, and more. I’m Neville Hobson. Joining me is our guest, David Sanders, Associate Professor in the Department of Biological Sciences at Purdue University in West Lafayette, Indiana, in the United States. He has the ability to spot fakes in scientific research publications with uncanny accuracy, which has earned him the moniker “the scientific sleuth”.

Welcome, David, thanks for joining us!

David Sanders: Thank you very much for having me here today.

Neville Hobson: You’re also on the University Senate and you’re chair of the Student Affairs Committee.  And wearing your scientific sleuth hat, you argue that plagiarism is a serious academic issue that must be confronted. To start our conversation, please share with our listeners a little about your current role at Purdue and what you’re doing today.

David Sanders: Sure, so I have worked for a long time on structural biology and virology, how viruses enter cells, and I’ve worked on something called phosphoryl transfer, how phosphate is transferred. And in the course of that work, I’ve been interested in issues of scientific research integrity about the veracity of the scientific literature because I really believe that in science we need to communicate both effectively and truthfully. And during the course of my career I have seen a number of times when there are violations of those principles. And I think it’s important not only for me but for all scientists to point out when there are violations of scientific norms. But this is really just a part of the scientific endeavor, discriminating between what is truthful and what is false.

Neville Hobson: Okay, that’s an interesting point you’ve made about that differentiate between truth and what’s not. We’ll talk about that in a second. I’m keen to also discover a little bit about what brought you to this point. As our listeners will know when we get into our conversation, this area of discovery of what’s true and what isn’t is really quite huge. And you have some very, very interesting experiences that we’re going to talk about a bit. What led you to get to this point, David?

David Sanders: So I think my recent incarnation as a scientific sleuth started with a publication in science about the idea that you could replace arsenic for phosphorus in nucleic acids. And because I had extensive experience in the field of phosphoryl transfer, I was very familiar with the chemistry of phosphorus in biology, I knew that was impossible. And I went in, took a look at the article, had an examination of the supplementary material in the article, and I saw that in fact the data present in the article directly contradicted the article, the meaning of the article itself and this article attracted a tremendous amount of uh… attention it was featured on the front page of newspapers all over the world uh… the people involved were celebrated as finding out new things about life as we don’t know it and it was all based upon bad chemistry and a bad interpretation of their own data that the that they knew about And so I started writing and speaking about this. But as I looked into it, it wasn’t just an issue about the scientists. It was an issue about the media and an issue about publications and an issue about peer review. And when I went around and spoke to people around the world, gave seminars about this around the world, people would share their own experiences with trying to confront these issues and how difficult it’s been for them. And one of the most common refrains I would get is, you think that’s bad, let me tell you about this. And that’s how I learned about the work at Ohio State, that there were some questionable practices in publications that were coming from that institution.

Neville Hobson: That’s the notable case, I think, in 2017, right? That made your mark as the scientific sleuth. This was concerning Carlo Maria Croce, professor of medicine at Ohio State University who specialized in oncology and the molecular mechanisms underlying cancer. I’ve read up about this too, David. It truly is quite a story. His research attracted wide public attention because of multiple allegations of scientific misconduct, if we can describe it that way, that were verified in an investigation by his own institution, the Ohio State. He also sued the New York Times, the Ohio State University, and you all unsuccessfully. So I think it’s pretty accurate to say you were instrumental in bringing the scientific misconduct in the Croce Laboratory, including image and data manipulation and plagiarism into the public spotlight. And significantly, I think your work in this regard resulted in a number of scientific bodies retracting Crochet’s research publications. I mean, it really is quite a story, but it came at a bit of a cost to you, if I recall correctly. So to walk us through it, if you would, David, particularly the discovery phase, if I can call it that, leading up to calling him out.

David Sanders: Absolutely. So it began around 2014 that this was brought to my attention. I want to point out that there was an anonymous or pseudonymous investigator who independently, originally unknown to me, but later on I discovered this person’s work, who was also interested in the work of this particular laboratory, but it was brought to my attention. And I started looking at images and once I have the way I proceed is once I have images or Textual inch issues in one article from a group. I start looking at other articles from that group to see whether this is a honest mistake and just a happens just this one time or whether there’s a pattern of Behavior and in this case, I found relatively quickly a couple dozen papers with issues of either image duplication, image manipulation, and or plagiarism. And so I started writing to the journals about these issues. And I would just, you know, I just, I never accused any individual of these prlittle. All I spoke about was the papers themselves and most of the time there was little or no willingness to deal with this, not even to issue a correction, let alone a retraction of these articles. And sometimes it was duplication within an article, sometimes it was duplication between articles, which in my opinion should be automatic grounds for retraction. If you’re reusing data and claiming that it’s a different set of data, right? It’s not that you’re reusing it and claiming you know, you’re citing the previous paper and you’re using those data from that paper, you’re claiming that it represents a different experiment, that should be immediate grounds for retraction. But I got very little response. The only one time that was interesting was there was a journal that gave within a month a correction of the article. But when I looked at the correction, the correction was a flipping of one image for no particular reason and the substitution of an image from an article from that had been published three years before in a different journal. So the correction was itself an act of image duplication. And I wrote back to that journal. Journal wouldn’t take any action. So I happened to have had some contacts at the New York Times. So I reached out to them. And after a long investigation, front page article in the New York Times, also had a couple pages on the inside about this laboratory and the problems with the images. And one of the images that was featured in the article was one of the ones that I initially had looked into and had called out and asked for a correction or retraction. And I think this… The 2017 article seemed to have had a major impact on the way that journals think about these issues of image manipulation and image duplication and plagiarism. That’s what’s been communicated to me, that there was a major impact of this. There was also an investigation by Ohio State of this laboratory. uh… principle investigator was not found guilty of scientific misconduct uh… couple people in the laboratory were found guilty misconduct and all and subsequently numerous articles have been retracted because of the results of that investigation I’ll point out that Ohio state did regard in not providing sufficient guidance to his laboratory, members of his laboratory, what concerns me is that there seemed to be a reluctance on the behalf of the members of the laboratory to correct problems when they were identified. And I would argue that is a new form of falsification of literature. You know, it’s okay to make mistakes. It’s okay to, you know, sometimes images get duplicated. I get that, you know, that happens. But once it’s been identified to you, you need to take action and you need not, it’s wrong to deny that there is a duplication when there is obviously a duplication. And so those are the sorts of things that transpired. In terms of my own career, my university supported me. in the defense against the lawsuit, for which I’m very grateful. And now, I don’t think I should be doing it myself. I think everybody should be doing this whenever they are confronted. with these sorts of issues in their own field. But heretofore, there are only just a small number of us who are willing to address these issues and to do it in a public way.

Neville Hobson: Hmm. It’s a dilemma, I think it is, which you’ve mentioned there in particular, particularly in this age of artificial intelligence, we’ve got to touch on in a bit too, where the ease with which you can manufacture stuff, digital stuff that no one can tell you whether it’s real or not, and you pass it off as real, who’s to say it isn’t? That is part of the climate we’re currently in. But in this particular case, and indeed the more traditional approach to falsification of imaging and other digital content. A great deal of work is still to be done, right? It seems to me to expose people doing this and you’ve added to that by, you know, what could we do to help some of the institutions and other bodies who are made aware of it yet don’t take action either at all or quickly enough. Sure, there’s gotta be an avenue to explore this, that looks at this whole picture of fakery. Is it right to call it from a preventative point of view? I’m not so sure even, but what do you think is one avenue to go down in addressing all of this?

David Sanders: I first want to just mention that I believe that the journals are taking a greatly increased interest in this issue and they’re having more dedicated people who are trying to address the sorts of issues that I discuss and for example scientific paper mills, places that just generate articles from pre-existing data that’s completely fake. But one of the things I’m proposing is that of a large proportion of what’s published in the literature need not be published in the literature. It could simply be deposited in databases. And people will understand that this is raw data, but the idea that all of these data have to be presented in an article, which really doesn’t say anything beyond the data itself. I think that would be a tremendous advantage. And one of the reasons is that the peer review system is completely overtaxed with the amount of review that’s required. And furthermore, there is a proliferation of publications and it’s hard for people to evaluate the quality of those the peer review that’s gone into those journals uh… whether we’re talking about members of the media or even members of the scientific uh… community and if we and if people just do that there were that these data were just the raw data they could evaluate the data for themselves it wouldn’t have the veneer of having been  reviewed you know peer reviewed because in many cases the peer review is substandard  then I think the scientific community will be much better off. Another thing that’s very important is to realize that post-publication review is just as valid as pre-publication review. Sometimes it’s better in quality and that needs to be valued and it needs to be appreciated by journals because and we need to have better venues for that to take place as well.

Neville Hobson: Yeah, that’s interesting. So I would say that if I summarize what you said in one sense, we need to publish less. So it’s about quality, not quantity. There’s such a lot of stuff out there that a good point you made about peer reviews being overwhelmed. There’s also that fact is there not that some of it is of dubious quality. And I think that must surely speak to. this kind of bigger picture description of our label, if you like, research integrity. This is directly challenges, if you like, the integrity of research itself and those researching, which is interesting because an episode of this podcast last year, we spoke with our guest that time, Nandita Quaderi, who’s the editor-in-chief and editor of vice president of the Web of Science as part of Clarivate, about research integrity. She talked about paper mills, which you’ve touched on as well. And there’s also what I guess you call the predatory publisher, that solicit manuscripts in charge of fee to publish, without any kind of oversight as to whether it’s any good. Is it worth your time reading it? But the point though is that adds to this mountain of dubious content that I think does present a challenge to the integrity of the research as a whole and indeed particular researchers. Do you think This idea of publishing less though, concentrate on the quantity, not the quantity, will address issues including that and gain mass support in the academic research community. What do you think of that?

David Sanders: I do think that that’s, I think it will address those issues of research integrity. I mean, I’ve argued on a number of points over time, one of which is that, for example, in the United States, there should be lists of journals in which you can use federal funds to publish. and that there should be lists of journals that you should not. And we know about these. We can identify predatory journals. They have a number of characteristics. And we also need to understand that quality comes first. And we need to have a better measure of evaluating quality. You don’t have to publish everything. And the other problem is that when promotion and evaluation for hiring, it is articles that seem to be the cash, the monetary value in the process. And we need to be thinking more holistically about contribution to the scientific endeavor. For example, I believe that proper peer review is a very important contribution to the scientific endeavor and should be rewarded as such. I believe it should be included on one’s CV. I believe it should be public and available. I also believe, I’ve also argued, that peer reviewers should be paid and they should be better trained. The peer review process should be more professional. than it is currently and I think all of those things fewer publications a better peer review regime will contribute to a better scientific literature a more authentic and reliable scientific literature

Neville Hobson: I think I kind of hear a call to action there, David, it seems to me. So we touched on AI just now. I think it’s quite a good moment to consider that in light of everything that you said. The thought occurs to me, listen to what you’re saying there. So is this situation of either too much content of dubious quality and too much for people to peer review correctly? Is this just going to get worse? You wrote a piece about AI in the Times, the London Times, higher education supplement recently. What are your thoughts about that? Is it gonna get worse or can this be addressed?

David Sanders:  I would argue that is going to at least in the short term get worse. Two things about that. Number one is that When I’m working on identifying plagiarism or image manipulation, I have to be able to find it. And if one put in even effort without necessarily AI, but just using the other electronic tools that we have, many people could have evaded my ability to detect them. And yet because you know they were lazy or because they had gotten away with this repeatedly in the past uh… they didn’t bother to do that i think that AI provides new tools that are going to make it increasingly difficult to detect and it also it’s so easy and so cheap to generate it that there’s going to be a proliferation of my recent article i discussed the fact that Jonathan Swift through his uh… pseudo persona Gulliverr had visited a member of the Royal Scientific Society of his time, the Royal Academy of Science, I guess it’s called, and he had created something which sounds remarkably like chat GPT. This is back early 18th century. And of course, Jonathan Swift is making fun of this as an impossibility. But in fact, that’s what’s happened. That’s what’s happened now. We need, one of the things that we need to be thinking about at every step of the process, and I include just our whole educational system, is we need more rigorous training in the ability to distinguish between the truth and falsehood, between real data and fake data. And this needs to be something that is given to all, all students, all members of the scientific community so that they can, it shouldn’t just be a sideline. This needs to be front and center in our training. And it’s not going to be easy. It’s going to be increasingly difficult. But I think if we can implement a number of these proposals that I’ve made, more responsibility and more reward for the peer review process, a contraction of the, you know, publishing ecosystem and also more people seeing these issues as their own responsibility rather than just the responsibility of a coterie of people who are, you know, investigating these as sleuths. I think we can actually make some progress on these issues.

Neville Hobson: That sounds great. So I think we’ve got time for a brief look ahead into the future. One of my favorite questions in these kinds of conversations, what is the landscape gonna look like, particularly in this, let me call it the scientific sleuthing landscape. What it looked like 10 years from now, 2033, how do you see it? What should we be expecting in the coming decade? Will you be out of a job or hiring more detectives?

David Sanders: I would love to hire more detectives. I do train people in my own laboratory of undergraduates and occasionally graduate students who I train in these techniques and who have been very helpful in identifying these different issues and contacting journals. So I’m trying to create a larger group of people who do this. I don’t know how much of this is going to be able to be automated. I do feel that much of the reliance upon automation, for example, in plagiarism detection, has not been effective. We can use some of the techniques to identify easy things, but then you have to go on and do things at a higher level. One of my colleagues on the University of Toulouse identifies articles that have tortured phrases. And I’ll just give you an example. There’s a condition called lactose intolerance. Well, what happens is people take articles, they translate them into some other language and then back into English to avoid plagiarism detection. And then lactose intolerance gets turned into lactose bigotry. So we have references to lactose bigotry in articles. Some of this can be automated, but there’s always going to be an arms race between those who are using technology to evade detection of fraud and those who are trying to use technology to identify fraud. So I think, you know, at least for the next decade, there’s going to be plenty of work for scientific sleuths. I wish there were, I hope that there’s less. Because if we institute some of these policies, I think we’re going to be facing these matters less often. But, you know, that’s up to institutions to take these matters more seriously.

Neville Hobson: Well, that’s terrific, David. Thank you so much. This has been a really good conversation. It’s quite a topic we’ve discussed today. And I think we’ve just scratched the surface, but we surface some interesting elements in this big conversation, it seems to me. So thank you very much for sharing your knowledge and your insights and your experiences.

David Sanders: I greatly appreciate the opportunity to speak with you today.

Neville Hobson: You’ve been listening to a conversation about scientific research integrity, scientific misconduct and scientific sleuthing with our guest David Sanders, Associate Professor in the Department of Biological Sciences at Purdue University. You can find more information about David Sanders and his work on his Wikipedia page. At wikipedia.org, search David Sanders Biologist. For information about research integrity at Clarivate, visit clarivate.com and search research integrity. We’ll be releasing our next episode in a few weeks time. Visit clarivate.com slash podcasts for information about ideas to innovation. And for this episode, please consider sharing it with your friends and colleagues, rating us on your favorite podcast app or leaving a review. Until next time, thanks for listening. OK, I think we’re.

Outro: Ideas to innovation from Clarivate.