Poor scientific reproducibility is a product of poor reagents, sloppiness, selective publishing of data, and on occasion – outright fraud. Drug development will continue to fail if results from published studies cannot be relied upon. How are scientists working to right these wrongs?

July 2, 2014 — Scientific fraud makes headlines – particularly in cases where the findings were thought to be earth-shattering—such as the infamous falsification of stem cell studies by Korean scientist Dr. Woo Suk Hwang in 2006, and just recently, the retraction of stem cell studies published by Japanese RIKEN researchers who ruled that fellow scientist Dr. Haruko Obokata had intentionally manipulated data in a misleading fashion. But problems with the reproducibility of findings are rarely due to outright fraud. Research reagents might be of poor quality, or get mixed-up or contaminated. The execution of experiments or their analysis might have been done in a sloppy or inexperienced fashion. And hard-pressed researchers may only publish select pieces of their data that support their conclusions. These reasons contribute many-times more than fraud, to the prevalence of poor quality, irreproducible data in the scientific literature.

This problem is critical because there is only an estimated 5% success rate in drug development, and a significant amount of these failures are linked to the selection of drug targets from these poorer quality science publications. Time and resources of scientists are wasted when hypotheses are based on incorrect or inadequate prior studies. At the 2014 AACR Annual Meeting held in April in San Diego, CA, a session was devoted to Scientific Reproducibility –garnering huge attention from attendees, and indicating the seriousness with which this issue is being taken by the scientific research community. Everyone hopes to make the next headline — but for the quality and impact of their work, not for the lack of it.

Identifying and combating the reasons scientists publish unreliable data

Dr. Lee Ellis discussed issues with the reproducibility of scientific studies – i.e., whether another researcher performing the same experiments can obtain the same results. “Not all non-reproducible events are due to evil people,” said Ellis. He defined a spectrum of problems that contribute to this issue – from honest mistakes, to sloppiness, selective reporting, and the most serious —falsification and utter fabrication.

To identify the causes leading to publication of poor quality or even false data, Ellis conducted a survey on data reproducibility at his institution, the University of Texas MD Anderson Cancer Center, and gathered anonymous responses from 171 research trainees and 263 faculty members. He hypothesized that trainees and faculty would have different reasons for feeling the pressure to “massage data” for scientific publications. On average, more than 50% of those surveyed had at some point been unable to reproduce published data. Of trainees, 22% reported that they had felt pressure to publish findings for which they had doubt, and 31% had felt pressure to prove a mentor’s hypothesis was correct, even though the data spoke otherwise. Particularly worrisome, is when the VISA status of foreign students comes into play – those who don’t make progress on their projects may fear losing their mentor’s support for their VISA. Thus, for trainees, the culture of trying to please a mentor who is set on seeing a specific set of results – such as proving the mentor’s hypothesis—may drive some to find a way to produce the desired results. Other reasons include the need to have high-impact publications to complete a PhD or to obtain a job. For faculty, “publish or perish” is a key driving force leading to data-massaging – meaning that without publications, a scientist will not be able to maintain and advance in his/her career. “Impact factor mania” refers to the sought-after status of publishing in prestigious “high impact” scientific journals. Such journals require a “perfect story,” which may influence researchers to manipulate data by removing experimental results that do not fit the hypothesis. Publishing in higher impact journals or having a larger number of publications is necessary to obtain grants, get promotions and tenure, and are desired for stature or financial gain, such as the development of patents.

Fascinatingly, the higher the impact factor (a numeric measure reflecting influence) of the journal, the more often retraction — a public announcement of a study’s scientific inadequacy followed by its withdrawal from publication — occurs. Retractionwatch.com is a website that tracks and writes cynical articles on scientific publications that get retracted. A quick glance at the website reveals reasons for retractions ranging from honest mix-ups of reagents, to outright fraud, to journals astoundingly caving to corporate demands because a publication on a controversial topic harmed the company’s profits (i.e. Monsanto and GMOs).

Dr. Ellis despaired on the weak consequences doled out to researchers found guilty of committing fraud: retraction of publications, “supervision” of research for 3 years, and exclusion from service committees for 2-3 years, while most can still receive NIH funding. Clearly these “consequences” are not significant enough, when such people should simply be expelled from science. Emphasizing this problem, David Wright, the recent director of the US Office of Research Integrity, which monitors and investigates alleged scientific research misconduct, quit in February, claiming he was unable to get anything accomplished due to the oppressive political environment he had to operate within.

Efforts are being made to curb these issues. The NIH aims to raise awareness and educate researchers on these occurrences and their causes. The Reproducibility Initiative, recently begun by the Scientific Exchange, provides a platform for independent reproduction of studies, and rewards independently validated publications with special recognition. Ellis emphasized the importance of implementing other changes to reduce the culture of “impact factor mania.” He suggests overhauling grant application and publication practices, and encouraging publication of negative data, which would have an added benefit of reducing time and resources that researchers spend testing hypotheses that someone has already found to be wrong.

Learning from failure: lessons from retracted articles

Dr. Ferric Fang, of the University of Washington School of Medicine, discussed what can be learned by evaluating retracted papers. If the integrity of a scientific publication is called into question, an investigation can be held, usually by the institution of the scientist who led the study. If the publication is deemed to be enough of a failure for any reason, honest or dishonest, it is retracted from the literature.

Despite the prevalence of non-reproducible studies found in the scientific literature, only 1 in ~10,000 publications have been retracted, the majority being due to scientific misconduct: plagiarism (including duplication or self-plagiarism, in which data from previously published studies is recycled – a no-no for all reputable scientific journals) and fraud. Studies that are non-reproducible due to honest mistakes, sloppiness, and selective reporting, have only very rarely been retracted, despite their prevalence. Out of the list of the top 17 scientists who have had the most papers retracted, only one was due to a contaminated reagent, while all the others committed fraud. Why the disparity – that retraction is almost exclusive to misconduct, while the vast majority of irreproducible publications are due to lesser evils? Simple – the cost of an investigation is estimated at over $500,000 per allegation. Therefore, institutional investigations are performed almost exclusively when blatant scientific misconduct is suspected.

Due to the hugely negative connotation associated with retraction – deliberate misconduct– when scientists discover errors in their publications, they instead submit “errata” –corrections to the previously published data. Problematic though, is when the study was so poorly performed that it should be retracted, but to avoid retraction, a significant number of errata are added – to the point that the conclusions of the study are called into question. A alternative mechanism to red-flag publications with incorrect or non-reproducible data may be appropriate, so that the negative implications of misconduct associated with “retraction” do not mar a scientist’s reputation (for instance, the hundreds of papers already published on a “breast cancer” cell line later discovered to be melanoma, should be red-flagged).

PubMed, the NIH database of published scientific articles, lists about 3,000 publications that have been retracted, 17% of which are in cancer research – an overrepresentation, as only 12% of all publications are in the cancer field. Biomedical research as a whole is overrepresented for retraction rates, compared with fields such as math and physics. Fang presented an analysis of the scientific journals with the most retracted publications. High-impact factor journals such as the New England Journal of Medicine, Science, Cell, and Nature ranked among the top for retractions due to fraud or suspected fraud, while mostly obscure, lower-impact factor journals ranked highest for retractions due to plagiarism and duplication. Publications retracted for accidental errors tended to be from higher-impact journals. The US, followed by Germany, led amongst countries with the most retractions due to fraud or suspected fraud. China and India ranked highest along with the US, among countries most prominent for retractions due to plagiarism and duplicate publication. This indicates that the reasons for scientific misconduct, retraction, and/or investigation into misconduct, differ between developed and developing countries.

In addition to the factors discussed above by Ellis, the roots of misconduct in the US include increasing competitiveness for diminishing research funding– leading some researchers to take sloppy short-cuts or resort to desperate and dishonest practices. Frighteningly, the rate of fraud and misconduct appear to be increasing, but reassuringly, so is the attention on it. Still, peer reviewers either assume honesty and integrity in experimentation and reporting, or are too busy, to assess a manuscript under review for issues such as fraud or plagiarism. New policies, including education and awareness, should be instated to deal with this double edged sword – decreased federal funding alongside increased bad publications.

Raising the standards for biomedical research

“Cancer is evolution. Cardiology is just plumbing. Of course the advantage that they [cardiologists] have, is that most Americans believe at least in plumbing,” joked Dr. Glenn Begley, of TetraLogic Pharmaceuticals, CA. His talk at the AACR was focused on how to set the bar higher for the quality of the data that is published.

Pharmaceutical companies rely heavily on studies published by academic scientists, but have had low success rates in drug development due to the poor quality of published studies. Begley was part of a team of scientists at Amgen that tried to reproduce a number of high-impact, “landmark” academic publications – with devastating results: only 6 of 53 studies could be validated.

Dr. Begley described six red flags for identifying scientific studies of poor integrity: studies were never blinded, not all results from all experiments were shown (i.e. data from selected experiments were presented as “representative data”), experiments were not properly repeated, positive and negative controls were not performed or shown (i.e. the appropriate comparisons to know an experiment and reagents are working), reagents were not validated or used properly, and statistical tests were not appropriate to determine if the results should be considered as significant. To assure that published scientific studies are of the highest integrity, Begley suggests that all studies be blinded, and that all of these red flags be addressed by the study researchers, peer reviewers, and journal editors, prior to publication.

Raising the quality of biomedical research reagents

Dr. William Sellers, of Novartis Institute for BioMedical Research, MA, discussed problems and solutions surrounding poor quality scientific research reagents. Cell line contamination has been found to be a prolific problem– including mix-ups or contamination with other cell lines, and contamination with mycoplasma – a slow growing microorganism that frequently contaminates laboratory cell cultures. The Cancer Cell Line Encyclopedia (CCLE), a collaborative effort between Novartis and the Broad Institute, purchased ~1000 cell lines from various vendors. They shockingly found that 70-90 of these were not the cell line they were purported to be, and 5% were entirely “rogue.” In fact, a commonly used “breast cancer” cell line was instead found be melanoma. In response to reports such as these, Prostate Cancer Foundation (PCF) now requires annual progress reports submitted by PCF-funded researchers to include results from genetic tests validating cell line identity and evaluating potential contamination for all cell lines used.

Another large-scale problem is the quality – particularly the relative activity—of drugs and reagents used for research. Different drugs and reagents with the same putative function can have vastly different activities, depending on molecular structure (different drugs can be used to target the same molecule) and agent preparation. The cross-reactivity of drugs and antibodies is also an underreported issue—many have multiple molecular targets and these should be robustly evaluated and the results made freely available. Dr. Sellers discussed possible solutions to these problems, including the establishment of a reference ID for every research agent, and/or a “wiki,” where researchers can review and comment on agent quality. Scientific journals could require full disclosure of methods and reagents used, and studies that use poor reagents could be flagged in PubMed.

Finally, Dr. Sellers discussed the problems that arise in this era of new technologies that execute genome-sized experiments, such as whole-genome sequencing: The large amount of data generated comes with a significant amount of “noise,” or incorrect data that arises by chance– which must be filtered out. Thorough methods of statistical analysis and many experimental steps are needed to validate results from these important, yet noisy, data-rich experiments. Allocating further funding for validation of results is critical in order to gain the clearest and most relevant insight from these large data sets.

The value and impact of the scientific enterprise stands on the integrity of the work done – and the social impact that it leads to. While there is a substantial economic and time cost associated with research both leading to and resulting from flawed publications, the most devastating effect is when untrue “science” affects public health policies or ideologies, or leads to bad drugs entering clinical trials. No prize should ever be given to the scientist who brushes aside the central dogma of science –that systematic observation and sound experimentation performed in a repeatable fashion– forms the base of all knowledge.

Dr. Andrea Miyahira has a PhD in cancer immunology, and is Manager of Scientific Programs at the Prostate Cancer Foundation.