High-tech approaches to high-tech fraud

Journal editors and institutions are using technology to spot plagiarism and image manipulation

Elizabeth WagerElizabeth Wager is a publications consultant though her company Sideview, which offers training, editing, writing and consultancy on medical publications. She was chair of the Committee on Publication Ethics (COPE) from 2009 to 2012. She is based in Princes Risborough, UK. [divider] 3rd World Conference on Research Integrity

Although plagiarism existed long before computers were invented, the copy-paste function and access to masses of text on the internet certainly make it a whole lot easier. In the same way, photographic hoaxes have been around for a while, but tools such as Photoshop allow tricks that used to take hours in a darkroom, messing around with smelly chemicals, to be done in a few minutes from the comfort of your desk.

But participants at the 3rd World Conference on Research Integrity (#3wcri) in Montreal May 5 to 8 learned about how journals and universities are getting smarter at spotting fraud.

Starting at college level, students’ essays and term papers are now routinely scanned for plagiarism, and the same text-matching software tools are increasingly being used by journals. Reviewing editorials about plagiarism published from 2008-2012, Dr. Miguel Roig, Professor of Psychology at St John’s University in New York found that 67 percent of journals use software and the number is likely to grow.

For example, Elsevier plans to increase its use of the anti-plagiarism system CrossCheck in 2014 by introducing automatic screening of articles submitted to the Elsevier Editorial System (EES), Senior VP and General Counsel Mark Seeley said.

Identify theft and ‘avatar fraud’

While publishers have been alert to classic types of fraud, such as plagiarism, for many years, two presentations highlighted the new phenomenon of identity theft or “avatar fraud.”Dr. Ivan Oransky, founder of the popular Retraction Watch blog, related the case of Korean biologist Hyung-In Moon, who set up fake email addresses for suggested peer reviewers which allowed him to review 24 of his own manuscripts. Editors were fooled by plausible-looking reviews apparently coming from experts in the area, but their suspicions were aroused when reviews were supplied within 24 hours of the journal’s request – which may be an editor’s dream, but sadly isn’t the usual case.

Dr. David Wright, Director of the US Office of Research Integrity, gave more extreme examples of how accused researchers may use avatars (i.e., fake identities, or even fictitious people) in an attempt to deflect misconduct investigations.

He described how a US psychologist created a fictitious research assistant and faked an apology from her, taking responsibility for making up data, in an attempt to avoid an accusation of fabrication. Another researcher claimed that an “old friend” at another institution was responsible for certain analyses that had been questioned. The researcher produced emails to support this claim; however the email address turned out to be false, and when the “friend” was contacted, she denied any knowledge of the research. To avoid attending misconduct hearings, the same researcher also forged doctors’ letters indicating that she was infected by the bird flu (H5N1) virus and later that she had ovarian cancer.

Text-matching software

Dr. Howard Garner, Director of the Medical Informatics and Systems Division at the Virginia Bioinformatics Institute of Virginia Tech, created the eTBLAST text-matching search engine, which he has used to identify text matches within the Medline database. He reported that since 2006, the number of duplicated items added to Medline each year has been decreasing. He attributes this decline to the growing use of anti-plagiarism software at journals.

This is a good example of screening acting as a deterrent to authors or at least as a safeguard for journals (preventing plagiarized or redundant articles from being published). However, in other areas, less progress has been made.


CrossMark allows readers to check if they have the most up-to-date version of an article by simply clicking on the CrossMark logo, even from a PDF stored on their own computer.

Dr. Garner noted that articles which have formally been retracted by journals (because they were misleading or fraudulent) continue to be cited long after the retraction notice is published. This was probably inevitable when academics could only store paper copies of articles, and it remained a problem when electronic copies of articles were stored on personal computers, since readers, relying on stored documents, could not tell if the articles had been retracted or corrected after they were downloaded.

The CrossMark initiative could solve this problem. CrossMark allows readers to check if they have the most up-to-date version of an article by simply clicking on the CrossMark logo, even from a PDF stored on their own computer.

Screening images for manipulation

The importance of digital images in some research disciplines was emphasized by Dr.Bernd Pulverer, Chief Editor of The EMBO Journal of the European Molecular Biology Organization, who noted that “a figure is a scientific result converted into pixels.” But what happens if those pixels have been tampered with to create a misleading image? Many journals now use software, such as Photoshop, which can be used not only to manipulate photographs but also to detect unacceptable manipulation.

The EMBO editors now screen all figures using a simple visual check for problems and employing forensic software for about 60 percent. They find that around 20 percent of submitted figures show signs of manipulation. In most cases this is simply “beautification’ or tidying an image with no intention to deceive, he said, but the journal finds “serious manipulation” in about 4 percent of digital images. Because graphs are also important, The EMBO Journal is going a stage further and encourages authors to supply the source data used to create figures.

In contrast, Kenneth Heideman, Director of Publications for the American Meteorological Society, reported that although the society’s journals publish over 10,000 figures each year and routinely screen them, deceptive manipulation is almost never found. He explains this by the fact that meteorology is a small field compared with cell biology, and articles have a longer shelf life (with an average citation half-life of 10 years rather than six months).

Journals and institutions are getting smarter

Several speakers emphasized that we really don’t know whether research and publication misconduct are increasing (as it’s so difficult to measure), but there is certainly encouraging evidence that journals and institutions are getting smarter at spotting it.

A few years ago at the 1st World Conference on Research Integrity, it felt as if we still had to convince some people that misconduct was even worth talking about. Many senior scientists and institutions seemed to be in denial, and believed there was no need to take action against a few “bad apples” since the overwhelming majority of researchers were honest.

With smarter detection tools, we realise that this simply isn’t the case. When the Journal of Cell Biology began to screen images, it found 1 percent had been manipulated in a deceptive way. Other surveys have suggested that 1 percent to 2% of researchers commit serious misconduct, such as data falsification, at some point in their careers, so most major institutions — some of whom employ thousands of researchers — could expect to have at least one case every few years.


I am encouraged that we can use technology to detect some forms of fraud and that this is having noticeable effects. If they are used more widely by journals and universities, I’m hopeful there will also be a deterrent effect.

There will always be people who try to cheat, but it’s good to know that editors and academics have effective tools. The next battle is convincing everybody to invest the time and money to use them.

comments powered by Disqus

5 Archived Comments

Manoshi Goswami May 15, 2013 at 7:36 pm

Very effective steps are being planned. Good information.

Catriona Fennell, Elsevier May 15, 2013 at 3:49 pm

Good points from Mr Gunn.

Taking this into account, the CrossCheck plagiarism detection tool will shortly offer editors the option to exclude sections such as “Materials & methods” from the check for text similarities. Re the prevention of selective reporting of data or post-hoc fitting of hypotheses to data, the Registered Reports initiative from "Cortex" is a fascinating innovation in this area:


Liz Wager May 15, 2013 at 4:58 pm

I agree entirely. Defining plagiarism (ie re-use of other people's material) is hard enough, but defining what amount of 'text recycling' is acceptable is even harder. Human judgement is definitely needed and sometimes repetition is a good thing. I wrote a COPE discussion paper on the difficulties of defining plagiarism, which is available at http://publicationethics.org/files/Discussion%20document.pdf

I also agree that selective (and non-publication) are serious problems that occur more frequently than the more dramatic types of fraud and are also harder to detect and prevent, although registration of clinical trials is a helpful step.

Mr. Gunn May 15, 2013 at 1:59 pm

Looking for textual matches can identify egregious cases of fraud, but I feel like every discussion of this should mention the fact that in the science literature, some duplication is actually helpful, for example in a experimental methods section

Also, it seems like even if you got rid of all plagiarism, you'd still leave the bulk of the problem untouched, because you'll not have done anything about selective reporting of data or post-hoc fitting of hypotheses to data.

Maybe that's not your issue to tackle, though.

Liz Wager May 17, 2013 at 8:29 am

Many thanks -- I'm glad you found it interesting