27 Jan 2014

It is a complaint heard again and again—an experiment that works in one researcher’s hands flops when others attempt to repeat it. In the January 27 Nature online, the National Institutes of Health in Bethesda, Maryland, acknowledges disquiet over the reproducibility of published data and proposes several avenues to address it. “The NIH is deeply concerned about this problem,” write NIH Director Francis Collins and Principal Deputy Director Lawrence Tabak.

The issue has spread to the highest levels of science policy. This Friday, President Obama’s Council of Advisors on Science and Technology will hold a meeting on scientific reproducibility (see PCAST meeting agenda).

In their Nature essay, Collins and Tabak note that preclinical treatment studies, for example those using mouse models, are most likely to fail the reproducibility test. They blame this on a variety of factors, including include poor training in experimental design, overemphasis on publication in a few high-status journals, and a system that relegates negative data to file drawers instead of public venues. “Some scientists reputedly use a ‘secret sauce’ to make their experiments work—and withhold details from publication or describe them only vaguely to retain a competitive edge. What hope is there that other scientists will be able to build on such work to further biomedical progress?” Collins and Tabak write.

The NIH is developing a training program on experimental design. By the end of 2014, it will become part of mandatory training for postdocs in intramural NIH labs, and will be made available for other institutions to use.

NIH grant reviewers should keep an eye out for less-than-sound experimental design in grant proposals, Collins and Tabak write. Some of the Institutes are trying out a checklist to remind reviewers to look for important design features such as blinding in submitted proposals. Randomization and blinding of preclinical studies has become standard practice in industry, but it is not uniformly done in academic labs. The same is holds true for sample size calculations and having a statistical analysis plan drawn up before an experiment begins.

The agency is also piloting a program that assigns one or more reviewers to assess the published data used to support the research proposal. For example, these reviewers might determine that preclinical studies require validation before a clinical trial moves forward. By the end of 2014, the NIH will determine which of these approaches to apply in future grant evaluations.

In addition, the NIH wants to make scientific information, including negative data and critical comments, accessible outside of the standard publications. In December, it launched the online discussion forum PubMed Commons, which already has accumulated more than 700 comments on published work. The agency also has invited applications from researchers interested in developing a “Data Discovery Index,” which would provide unpublished primary data in a citable format. Applications are due March 6, 2014, and grants will be awarded next September.

Tabak and Collins emphasize that “reproducibility is not a problem that the NIH can tackle alone.” They deplore that academia encourages researchers to focus on publishing in only a few top-notch journals. Other leaders have also blamed those journals for focusing on manuscripts that maximize excitement and impact factor rather than papers with the most rigorous methodology. The 2013 Nobel laureate Randy Schekman of the University of California in Berkeley recently pledged to eschew what he called “luxury journals,” such as Science, Nature, and Cell (see Times Higher Education). To mitigate the effect of publication records in grant decisions, the NIH will consider asking applicants to specify their contributions to cited papers.

Journals are taking heed of this criticism. Nature recently rescinded space limits on its methods sections, developed a checklist to ensure that authors provide the experimental details that others would need to replicate the work, and planned to consult statisticians more regularly (see May 2013 news story). Earlier this month, Science announced that it would invite more statisticians to its Board of Reviewing Editors and expect authors to provide experimental details on randomization and blinding (McNutt, 2014). In 2012, the open-access publisher Public Library of Science and collaborators launched an initiative to repeat experiments and publish the results.

The Alzheimer’s field has seen its share of irreproducible results. Most of the time, an initially high-profile paper fades away quietly if others fail to reproduce the findings but never publish those data. Occasionally, negative replication attempts do make it into the published record. Eight laboratories reported being unable to reproduce a 2010 Nature study that suggested the cancer medication imatinib slowed Aβ production (see Jan 2014 news story and commentary). Several labs have struggled to replicate a 2003 finding that Aβ accumulates in the eye’s lens (see May 2013 news story). In both cases, authors of the original findings pointed to differences in experimental technique for the diverging results. Some see standardization of research methods as a solution for translational research, and this has become a key issue for the Alzheimer’s Disease Neuroimaging Initiative and efforts to develop fluid biomarkers for use in the clinic (see Oct 2008 news story; Nov 2009 conference story).

Ultimately, no one player alone can fix the irreproducibility problem. “Success will come only with the full engagement of the entire biomedical-research enterprise,” write Collins and Tabak.—Amber Dance and Gabrielle Strobel

National Institutes of Health Tackles Irreproducibility Problem

Quick Links

Tools

Comments

Make a Comment

References

News Citations

Paper Citations

External Citations

Further Reading

Papers

News

Webinars

Primary Papers

Annotate