In the absence of a definitive test for Alzheimer’s disease, the diagnosis of dementia remains something of an art, dependent on a clinician’s experience and judgment. The truth of the matter becomes clear only upon pathological examination, which sometimes reveals misdiagnoses. To avoid this scenario, research studies often use a panel of experts to diagnose volunteers. Do such panels actually improve the accuracy of diagnosis? The question has received surprisingly little study. In the December Archives of Neurology, researchers led by Matthew Gabel at Washington University in St. Louis, Missouri, and Norman Foster at the University of Utah in Salt Lake City present evidence that consensus panels can improve diagnostic accuracy, but only under certain conditions. Their data suggest that a panel should contain people who have a diversity of opinions and training; otherwise, a group performs no better than an individual. The paper also highlights the usefulness of biomarkers such as imaging data to increase diagnostic accuracy.
“I am glad they did the study, because I do not think people have looked carefully at [consensus panel use],” said John Morris at Washington University. The paper shows that “consensus diagnosis helps eliminate diagnostic variability and error,” Morris said.
Ronald Petersen at the Mayo Clinic in Rochester, Minnesota, agrees. “One take-home message is that consensus panels are accurate, effective, and useful for research right now, and can perhaps be made even better by adopting a method that formalizes how consensus panels operate.”
To test the hypothesis that panels could improve disease identification, Gabel and Foster made use of historical data from 45 people who had been confirmed to have either AD or frontotemporal dementia (FTD) by pathological examination. The authors prepared three sets of data from each patient: only clinical reports (excluding neuropsychological data); only imaging analysis from positron emission tomography with the glucose analog fluorodeoxyglucose F18 (FDG-PET); or both.
Gabel and colleagues then convened two different panels, one composed of six trainees in various medical specialties such as neurology, psychiatry, and geriatrics, and the other composed of six experienced physicians in these fields. None of the panel members was a radiologist or experienced in reading PET scans. The panels used the Delphi consensus method: The same data are presented to all members, and each member makes an independent decision; the diagnoses are then shared with the group and discussed until consensus is reached (see ARF related news story and Jones and Hunter, 1995). Social science research indicates that this methodology should foster accurate group judgments (see Gabel and Shipan, 2004). Gabel and colleagues compared the consensus diagnoses and those made by six independent neurologists who specialize in dementia against the autopsy-confirmed diagnoses.
As the authors expected, the effectiveness of the consensus decisions depended on which type of data the panels reviewed. When looking at only clinical data, both the trainee and expert panels averaged 84 percent correct diagnoses, significantly better than the correct calls made by individual raters and panelists, which ranged from about 70 percent for trainees to close to 80 percent for experts. Gabel and colleagues found that when reviewing clinical data, the consensus process was 3.5 times more likely to improve the accuracy of decisions than to impair it. However, when FDG-PET data were available, panel decisions were no more accurate than those of individual raters, and sometimes less so. Using PET data, diagnosis was about 90 percent accurate in all cases. The higher reliability of PET data has been shown in other studies, including an earlier paper using data from the same 45 patients (see Foster et al., 2007).
Gabel explains that the poorer panel performance stems from lack of expertise with regard to image data. All reviewers received identical training in how to interpret PET scans, with the result that all members of a panel were using similar thought processes and were likely to make the same mistakes. Therefore, there was no benefit from the group process. “The most valuable environment for group consensus is one in which people think differently and make mistakes for different reasons,” Gabel said. With the clinical data, the panelists brought varied expertise to bear, and the group as a whole made a greater number of correct decisions. Gabel speculates that a panel composed of radiologists, each with distinct training and experience in interpreting scans, would show improved diagnostic accuracy over the decisions made by individual radiologists when looking at PET scans. In future studies, Gabel would like to further test the idea that panel diversity improves accuracy.
Consensus panels like those studied in this paper are unlikely to be used in standard clinical practice, those in the field agree. Instead, panels are most likely to be used in research and drug development, for example, in making diagnostic determinations for clinical trials or doing longitudinal studies. Gabel points out that the findings of many National Institute on Aging-funded studies rely on accurate diagnoses from consensus panels. If the panels are making frequent misdiagnoses, the study results are flawed, Gabel said, so researchers need guidelines for setting up valid panels. One problem with these bodies, however, is that they are expensive and time consuming. A possible alternative, Gabel said, would be to conduct panels remotely, perhaps over the Internet or phone. Would such a process be as effective as face-to-face meetings? That question should be studied, Gabel said.
It is noteworthy that in this study, experienced clinicians with no radiology background were able to achieve better accuracy from PET data than they could from clinical data. Morris believes that the field will ultimately incorporate imaging and other biomarker assays into both clinical and research diagnostic processes (see ARF related news story on Dubois et al., 2007). A proposal to update research diagnostic criteria for AD by including biomarkers caused a stir at the 2010 International Conference on Alzheimer’s Disease (see ARF related news story). One potential problem with imaging data, however, is that PET scans are expensive, and they are not available in all areas of the country, which may limit their use clinically. Also, with no definitive biomarker test available, Petersen suggested that clinical criteria will continue to dominate diagnosis for some time to come, adding, “Right now, consensus panels, at least for research purposes, are probably the best way to go.”—Madolyn Bowman Rogers
- Delphi Consensus Foresees Sharp Rise in World Dementia
- AD Diagnosis: Time for Biomarkers to Weigh In?
- Noisy Response Greets Revised Diagnostic Criteria for AD
- Jones J, Hunter D. Consensus methods for medical and health services research. BMJ. 1995 Aug 5;311(7001):376-80. PubMed.
- Gabel MJ, Shipan CR. A social choice approach to expert consensus panels. J Health Econ. 2004 May;23(3):543-64. PubMed.
- Foster NL, Heidebrink JL, Clark CM, Jagust WJ, Arnold SE, Barbas NR, Decarli CS, Turner RS, Koeppe RA, Higdon R, Minoshima S. FDG-PET improves accuracy in distinguishing frontotemporal dementia and Alzheimer's disease. Brain. 2007 Oct;130(Pt 10):2616-35. PubMed.
- Dubois B, Feldman HH, Jacova C, Dekosky ST, Barberger-Gateau P, Cummings J, Delacourte A, Galasko D, Gauthier S, Jicha G, Meguro K, O'brien J, Pasquier F, Robert P, Rossor M, Salloway S, Stern Y, Visser PJ, Scheltens P. Research criteria for the diagnosis of Alzheimer's disease: revising the NINCDS-ADRDA criteria. Lancet Neurol. 2007 Aug;6(8):734-46. PubMed.
- Gabel MJ, Foster NL, Heidebrink JL, Higdon R, Aizenstein HJ, Arnold SE, Barbas NR, Boeve BF, Burke JR, Clark CM, Dekosky ST, Farlow MR, Jagust WJ, Kawas CH, Koeppe RA, Leverenz JB, Lipton AM, Peskind ER, Turner RS, Womack KB, Zamrini EY. Validation of consensus panel diagnosis in dementia. Arch Neurol. 2010 Dec;67(12):1506-12. PubMed.