03 Jun 2011

Unlike the classic children’s tale where slow and steady wins the race, in biomedical research it’s the hare that often gets the glory. This seems especially true in the biomarker field, where potential new indicators of disease risk and prognosis leap onto the scene with record speed, yet vanishingly few finish the journey to routine clinical use. A study in this week’s Journal of the American Medical Association reports the magnitude of the problem. In this analysis of high-profile biomarker papers, “about 85 percent of the time, the highly cited study reported a much stronger effect compared to what was suggested by the meta-analysis and largest study of the same association,” lead author John Ioannidis of Stanford University School of Medicine, Palo Alto, California, told ARF. The caution against overinterpreting individual studies applies to biomarker research in Alzheimer’s and other neurodegenerative diseases, scientists in the field noted, and large studies to address this concern are underway. Inflated effect sizes may not reflect faulty research, but rather ill-fated publication practices and funding mechanisms that are just starting to recognize the need for systems-based approaches in biomarker research.

To gauge the validity of initially reported biomarkers, Ioannidis and coauthor Orestis Panagiotou of the University of Ioannina School of Medicine, Greece, looked at biomarker articles cited more than 400 times in ISI Web of Science through 2010. The 35 eligible studies, all published between 1991 and 2006 in 10 high-impact journals, represent roughly the top 3 percent of biomarker papers, the authors note. The effects reported in these frequently cited papers were compared against results of meta-analyses and larger studies, with the idea that the latter investigations are “closer to the truth, statistically speaking,” Ioannidis said. Among the well-cited biomarkers, most were genetic variants or blood-based factors, and more than two-thirds were proposed for cancer or cardiovascular applications. The list did not include any imaging measures or indicators aimed at Alzheimer’s and other neurodegenerative disease. Ioannidis advises AlzGene on meta-analysis methodology.

Many of the highly cited associations were exaggerated, the authors found. For 30 of 35 frequently cited biomarkers, the largest study of each marker reported less promising effects, often more than twofold lower than those touted in the highly cited study. And in 29 cases, smaller effect sizes came up in the meta-analysis. At times, the large study or meta-analysis reported an association in the opposite direction as the oft-cited paper.

AD scientists found the results compelling, though not particularly new or surprising. “I think it’s an interesting paper. It will make people think,” said Chengjie Xiong, who heads the AD biostatistics core at Washington University School of Medicine in St. Louis, Missouri. However, “if you want to get to the bottom of the issue, it’s not the research per se. It’s how the research is published. Many negative findings do not end up in the literature.”

How does this play out in biomarker research? Initial studies tend to be small, which subjects them to considerable statistical variability, Xiong noted. “If the effect is variable in the right direction, journal editors get excited and will publish the findings,” he said. “But if they’re variable in the other direction, nobody cares and nobody publishes. That’s how the system works.” Making matters worse, Ioannidis said, spectacular results not only have an edge in getting published, but they are also more likely to get cited.

To curb these biases, “all studies should be reported—weak associations, strong associations, no associations,” Ioannidis said. “Otherwise, we’re left with a literature that is replete with seeming successes but does not reflect reality.”

Recognizing this problem, the journal Neurobiology of Aging started accepting negative data in 2004 (see ARF interview). This applies to all relevant research studies, not just on biomarkers. To date, the journal has received 165 such submissions, 89 of which were accepted for publication, editor-in-chief Paul Coleman noted in an e-mail to ARF. “Just as one wants evidence for the validity of a positive result, we ask for evidence of the validity of a negative result—adequate sample size, appropriate procedures, appropriate data analyses, etc. We feel the standards for a negative result should be as rigorous as they would be for a positive result,” added Coleman, who heads an AD lab at the Banner Sun Health Research Institute in Arizona.

Given how research is funded, that may be easier said than done, suggested John Trojanowski of the University of Pennsylvania School of Medicine in Philadelphia. “Study sections will often fund discovery science, but it’s hard to get an R01 to confirm other people’s results,” he told ARF. “A kiss of death in an R01 grant application is ‘oh well, this is just confirmatory.’ There should be other funding mechanisms that say, ‘Yes, validation and confirmation are what we want,’” Trojanowski said. “It’s hard for single labs to do all the work needed to not only discover, but also validate and qualify, biomarkers to bring them forward to clear standards so they can be used in the clinic.”

Over time, this is becoming clear to the U.S. National Institutes of Health, which committed nearly two-thirds of the $120 million needed for both phases of the Alzheimer's Disease Neuroimaging Initiative (ADNI). This is a highly coordinated study engaging more than 1,000 research volunteers across North America (see ARF ADNI series and ARF related news story) for the purpose of validating and standardizing AD biomarkers. Similar efforts have sprung forth in the Parkinson’s field with the Parkinson’s Progression Markers Initiative (see ARF related news story), and for frontotemporal dementia research with the FTLD Neuroimaging Initiative. A recent Nature editorial argues that larger, collaborative studies like these are required to move claimed biomarkers into clinical practice (Poste, 2011).

Meta-analyses can also alleviate the problem of exaggerated effect sizes, as the JAMA paper points out. However, pooling data from studies with different cohorts and analysis techniques may cause problems, too, Xiong said. Ideally, meta-analyses would re-crunch the numbers on individual patient data collected from the various studies. However, such raw data are often unavailable. In a recent paper, Xiong and colleagues describe a method for doing meta-analysis that combines individual-level data from certain studies with summary statistics from others (Xiong et al., 2011).

How different labs collect the same data can complicate biomarker analysis even further—a problem not discussed in the JAMA report, noted Douglas Galasko of the University of California at San Diego (see full comment below). “The AD field has been highly aware of these issues, and is addressing quality control for imaging and biofluid biomarkers appropriately,” he wrote in an e-mail to ARF (e.g., ARF related news story; see also Shaw et al., 2011 and Fagan et al., 2011).

Ultimately, “it would be premature to doubt all scientific efforts at marker discovery and unwise to discount all future biomarker evaluation studies,” wrote Patrick Bossuyt of the University of Amsterdam, The Netherlands, in a commentary accompanying the paper. The study “should convince clinicians and researchers to be careful to match personal hope with professional skepticism, to apply critical appraisal of study design and close scrutiny of findings where indicated, and to be aware of the findings of well-conducted systematic reviews and meta-analyses when evaluating the evidence on biomarkers.”—Esther Landhuis

Comments

- Douglas Galasko
  University of California, San Diego
- Posted: 03 Jun 2011
- Paper: Comparison of effect sizes associated with biomarkers reported in highly cited individual articles and in subsequent meta-analyses.
This JAMA paper subjects a number of blood-based tests, thought to have diagnostic discrimination in initial publications, to meta-analysis, and found greatly reduced discrimination. Does this mean that current enthusiasm for AD biomarkers should be curbed?

Some principles in the JAMA paper apply to AD biomarkers, namely, the hazards of overinterpreting diagnostic (or prognostic) discrimination from a single study. However, the biomarkers reviewed in the JAMA paper are all blood tests, and imaging biomarkers (an area of great relevance to AD) are not represented or discussed. Many of the blood tests in the JAMA paper are genetic polymorphisms or mutations, which have different connotations and applicability than biomarkers—a gene test indicates a trait, present from birth, whereas a biochemical biomarker represents an acquired state, and may change over time or be modifiable.

Some take-home messages for AD:

Understanding what aspects of biology the biomarker is measuring is often helpful. For example, the JAMA paper discusses markers such as homocysteine and CRP for heart disease—these are indirectly related to the pathology of heart disease. By comparison, CSF biomarkers that have been developed for AD are related to the biochemistry of the defining lesions of AD.

Initial studies need to be large enough to provide clearly interpretable data, with good principles of study design (e.g., gold standard diagnoses; separate groups of subjects for the discovery and the replication analyses; comparison with a non-AD dementia group to determine diagnostic specificity of the biomarker). For the biomarkers that have been proposed in the new AD criteria, there are many large-scale studies that define diagnostic sensitivity and specificity. ADNI is the largest prognostic biomarker study of mild cognitive impairment to date, and there are few large-scale studies of cognitively healthy controls who progress to AD and have had biomarker measurements. So the data on effect size/predictive power of biomarkers for prognosis in these settings will need further study in different populations.

Effect sizes are likely to decrease when biomarkers are studied in a less selected population. AD biomarker studies have generally involved carefully selected volunteers, sometimes within a limited age range. Effect sizes of biomarkers proposed for AD may be lower, for example, in a group of people aged 85 or older than in younger patients with dementia. Pathology studies representative of older adults living in the community (e.g., the Rush Memory and Aging Project study by David Bennett and collaborators; see Negash et al., 2011) has identified that mixed pathology often underlies dementia; biomarkers related to AD pathology may have lower predictive value in this type of setting than in a group of patients referred to a clinic.

The JAMA paper does not discuss issues surrounding measurement/assays for biomarkers. The AD field has been highly aware of these questions, and is addressing QC for imaging and biofluid biomarkers appropriately.

References:

Negash S, Bennett DA, Wilson RS, Schneider JA, Arnold SE. Cognition and neuropathology in aging: multidimensional perspectives from the Rush Religious Orders Study and Rush Memory And Aging Project. Curr Alzheimer Res. 2011 Jun;8(4):336-40. PubMed.

View all comments by Douglas Galasko

- Sanjay Pimplikar
  Case Western Reserve University
- Posted: 09 Jun 2011
- Paper: Comparison of effect sizes associated with biomarkers reported in highly cited individual articles and in subsequent meta-analyses.
The remarkable, if not entirely surprising, findings reported in this paper should be of interest to the readers of Alzforum because of the current excitement in the field about "AD biomarkers." As summarized above, the study by Ioannidis and Panagiotou indicates that most of the highly cited associations (in which a particular biomarker was associated as a risk factor for a given disease) were exaggerated, and that many associations were valid but had relatively modest effects, and may have only nominal value for clinical use.

To understand the potential implications of these findings for the AD research field, it is helpful to reflect on two points. First, why does this phenomenon (initial inflation of effect size) occur, and second, if this phenomenon is unavoidable, then how should we incorporate this knowledge while going forward?

The summary by Landhuis addresses the first issue, and I agree that the current practice of publication (endowing an enormously great "reward" on highly cited papers), coupled with funding realities, may be generally responsible for this trend. Previous works of Ioannidis and colleagues (1-3) present a compelling argument that the publishing practices in biomedical research, especially in the last two to three decades, can be best explained by applying the economic principle of “winner’s curse” (a mathematical model which explains the tendency of a winning bidder to have overpaid for an asset whose intrinsic value is uncertain). Ioannidis has forcefully argued how this practice seems to distort science (1) and could contribute to the phenomenon reported in the JAMA paper. (Incidentally, inflated associations are not restricted to the biomarker field or a particular disease, but are also observed in clinical trials [2,3] and genomewide association studies [4]). Since it is inconceivable that the current system of endowing great rewards to highly cited papers will change in the foreseeable future, it follows that current publication practices and the above phenomenon will continue to occur.

This leads to the second point: If this phenomenon is true and cannot be avoided, then how should we protect ourselves from the winner’s curse? Game theory and mathematical models give us tools to protect buyers (that is, us, the consumers of the scientific information) from overpaying (5), and thus from misallocating the resources (6). However, one prudent and practical way going forward will be for the AD field to evaluate the biomarker data critically and not to subscribe only to the most optimistic view. Formation of ADNI shows that the AD field as such is aware of the issues of reproducibility and effect size. We all share in the excitement brought to the AD field by biomarkers and understand the necessity of this approach. However, high endowment (price) and limited feedback have been shown to lead to extreme curses (5), and I hope that these comments are viewed as constructive. The conclusions of the JAMA studies have significant implications for the AD field, since biomarkers form an important component of new sets of criteria proposed by NIA/Alzheimer’s Association for diagnosing AD (7). The Webinar organized by Alzforum to discuss the new biomarker-based criteria could not have been more timely (8). It will be interesting to hear what the panel members have to say about the findings of this paper.

References:

See also New guidelines for diagnosing Alzheimer’s: What do they mean for you? Retrieved on 5 June 2011; and Alzforum Webinar: Two New Sets of Diagnostic Criteria.

References:

Young NS, Ioannidis JP, Al-Ubaydli O. Why current publication practices may distort science. PLoS Med. 2008 Oct 7;5(10):e201. PubMed.

Ioannidis JP. Contradicted and initially stronger effects in highly cited clinical research. JAMA. 2005 Jul 13;294(2):218-28. PubMed.

Ioannidis JP. Why most discovered true associations are inflated. Epidemiology. 2008 Sep;19(5):640-8. PubMed.

Kraft P. Curses--winner's and otherwise--in genetic epidemiology. Epidemiology. 2008 Sep;19(5):649-51; discussion 657-8. PubMed.

Foreman P, Murnighan JK. How to avoid the Winner’s curse. Org. Behav. Human Decision Process. 1996 Aug;67(2):170-80.

Bezprozvanny I. The rise and fall of Dimebon. Drug News Perspect. 2010 Oct;23(8):518-23. PubMed.

View all comments by Sanjay Pimplikar

Make a Comment

To make a comment you must login or register.

References

News Citations

Paper Citations

Poste G. Bring on the biomarkers. Nature. 2011 Jan 13;469(7329):156-7. PubMed.
Xiong C, van Belle G, Zhu K, Miller JP, Morris JC. A Unified Approach of Meta-Analysis: Application to an Antecedent Biomarker Study in Alzheimer's Disease. J Appl Stat. 2011 Jan 1;38(1):15-27. PubMed.
Shaw LM, Vanderstichele H, Knapik-Czajka M, Figurski M, Coart E, Blennow K, Soares H, Simon AJ, Lewczuk P, Dean RA, Siemers E, Potter W, Lee VM, Trojanowski JQ, . Qualification of the analytical and clinical performance of CSF biomarker analyses in ADNI. Acta Neuropathol. 2011 May;121(5):597-609. PubMed.
Fagan AM, Shaw LM, Xiong C, Vanderstichele H, Mintun MA, Trojanowski JQ, Coart E, Morris JC, Holtzman DM. Comparison of analytical platforms for cerebrospinal fluid measures of β-amyloid 1-42, total tau, and p-tau181 for identifying Alzheimer disease amyloid plaque pathology. Arch Neurol. 2011 Sep;68(9):1137-44. PubMed.

Overblown Biomarkers: Are Publication, Funding Practices to Blame?

Quick Links

Tools

Comments

References:

References:

Make a Comment

References

News Citations

Paper Citations

Other Citations

External Citations

Further Reading

Papers

News

Primary Papers

Annotate