Ivan Lieberburg, Elan Pharmaceuticals, noted in his lecture. The FDA and its European counterpart, EMEA, are paying increasingly close attention and have raised the bar for how rigorously putative biomarkers must be vetted before a new product using that marker is approved. Insurance and government payers, too, have become more selective in what biomarker procedures they will reimburse.
The typical definition of the term "biomarker" in a clinical setting involves a physical sign or lab measurement that occurs in association with a disease and has diagnostic value. Once that biomarker is vetted to substitute for a clinically meaningful result, it then achieves the status of a surrogate marker. Few biomarkers have achieved this desirable status, and a critical variable in getting there lies in the choice of a meaningful clinical endpoint, Lieberburg said.
Cancer research offers lessons for AD, which is at a much earlier stage of developing clinically relevant biomarkers. A decade ago, showing that a tumor responded to a drug by shrinking in a CT or PET scan for a certain period of time passed as a satisfactory clinical endpoint. Now, only survival meets that standard in most cancers, because it has since turned out that initial tumor response to a treatment does not capture its benefit. People responded but still died early. It is important to distinguish clearly between various intermediate endpoints and the ultimate outcome, and to know to which endpoint a given biomarker relates. The key concept involves a distinction between target-based versus disease-based biomarkers.
For example, when a person takes a β-blocker, his or her heart rate slows within hours. This primary effect of the drug can be measured, and it constitutes the target-based biomarker. Subsequent physiological effects then follow this slowed heart rate and ultimately result in the clinical outcome of reduced death from congestive heart failure. Of this entire series of in-between effects, some will be of intermediate clinical benefit and some can be measured. In theory, they can serve as disease-based biomarkers so that studies do not have to wait years for quantification of the final clinical endpoint of survival, but there is a caveat. “The further removed the target-based biomarker is from the clinical outcome, the greater the chance it is wrong,” Lieberburg said.
First, the good news. There are two well-documented examples—hypertension and hypercholesterolemia—where the correlation has worked so well that the target-based biomarker has become the disease itself. Strictly speaking, hypertension and hypercholesterolemia are only biomarkers, but their link to the ultimate clinical outcome of death from cardiovascular disease has proved so tight that physicians, the FDA, payers, and the public at large view them as diseases. “That is where we eventually want to go with surrogate AD markers, as well,” Lieberburg said.
Sadly, failures of surrogate markers are more numerous. Medical history contains many examples where investigators took a biomarker, converted it in their own minds into a surrogate marker, and then went astray, Lieberburg said. Either the putative biomarker did not correlate with a desired clinical outcome, or it did not capture the risk to which an intervention exposes a patient. Consider the example of atrial fibrillation. In this condition, the atrium flutters inefficiently instead of contracting normally; as a consequence, clots form and the risk of embolism and stroke rises dramatically. The classic procedure of cardioversion was widely used to jolt the heart into resuming a normal sinus rhythm (NSR), and NSR then assumed the status of surrogate marker for a good clinical outcome for patients with atrial fibrillation. Because electrical cardioversion is unpleasant, doctors switched over to chemical cardioversion, using digoxin and quinidine to reestablish NSR. It worked initially but then the patients’ mortality increased because the drugs touched off a process that leads to a form of rapid heartbeat called ventricular tachycardia. “Chemical conversion to the surrogate marker NSR turned out to be a Pyrrhic victory,” Lieberburg said.
Another example is bone density, considered a surrogate marker for fractures in osteoporosis. Most drugs, such as estrogen or SERMs, increase bone density and reduce fracture rates. But one of the most potent boosters of bone density, sodium fluoride, turned out to generate brittle bones that were as prone to fracture as non-treated osteoporotic bone. Bone density, then, is a flawed surrogate marker for that drug, Lieberburg said.
Then how does the AD field vet a biomarker? This is where science still needs to do some heavy lifting. A roadmap exists from other fields, and it includes these criteria:
For the gold standards of hypertension and hypercholesterolemia, decades of efforts went into gathering data on all these criteria. Indeed, the relationship between cholesterol and coronary artery disease appeared in the literature as early as 1938 and grew stronger with epidemiology data from the Framingham study in the 1970s and subsequent studies on many thousands of patients. Additional time went into building a consensus with FDA and payers. A more recent example of this gradual consensus building is prostate-specific antigen (PSA) testing for prostate cancer. Where this biomarker is the most useful depends on the stage of disease in a given patient and on the specific clinical question asked. Even though PSA has been widely used for years, the field is still defining exactly what it does well (i.e., predict mortality when measured as rate of change) and what it does poorly (i.e., guide chemotherapy adjustments). In AD, too, the value of a given biomarker will likely depend on the stage of the disease and exactly what it is asked to do, Lieberburg said.
- epidemiology to associate the biomarker with a clinical endpoint
- clinical relevance
- sensitivity, specificity
- reliability, robustness
- practicability; it must be noninvasive
- simplicity to adapt into clinical practice
Where are biomarkers used? A prominent area is drug discovery, and an established example to consider is viral load in HIV-AIDS. It is accepted as a marker for drug screening and predicts well whether a compound will eventually lead to a drug on the market. More frequently, however, industry researchers do not know whether the biomarker they use correlates with the clinical endpoint. That remains true for Aβ as a biomarker in AD drug discovery, Lieberburg said.
Biomarkers are used routinely in pharmacodynamic/pharmacokinetic (PK/PD) measures to determine whether the experimental drug at hand reaches the site of action, and in designing human trials. There, too, surrogate markers can lead investigators astray. Lieberburg cited an example where a presumed antidepressant that had worked well in carefully designed animal studies and in human trials using an imaging surrogate marker later turned out not to affect depression at all. “Despite the best efforts, you may still be going down the wrong path with unvetted surrogate markers,” he said. AD researchers face a similar problem, where animal models capture aspects of what is ultimately a uniquely human disease.
Having told these cautionary tales, Lieberburg added that the standard the FDA imposes is surmountable. Subpart H of FDA rules stipulates that the agency can approve a drug based on a surrogate endpoint. The sponsor needs to submit clinical trial data that show the drug has a pathophysiologically-based effect on a surrogate marker and that this change in the marker reasonably predicts a clinical benefit. The FDA has applied Subpart H to expedite approval for serious and life-threatening diseases such as HIV infection, where drugs go through solely based on their ability to reduce viral load, not mortality, and for cancer (for recent drugs approved in this way, see FDA page).
AD also falls into this disease category. This means that accelerated approval for AD drugs is within reach but the quality of the surrogate marker will be crucial, Lieberburg said. AD research has produced a number of interesting candidates. The key challenge for researchers to keep in mind as they explore them is that any candidate needs to come armed with solid data on the bullet point list above in order to pass muster with the agency. To date, no AD surrogate marker that is vetted on all criteria exists, but the language of Subpart H suggests that it may be possible to use instead a portfolio of individually less-vetted markers so long as they each respond to the drug in a pathophysiologically relevant way. No drugs have as yet been approved in that way, but it would be worth making the case, Lieberburg added.
One problem holding back AD research is that it is still unclear which aspects of the pathophysiology one must treat to get an improvement of the clinical endpoint. Pharmaceutical companies can help with that by showing in their clinical trials that the drug at hand actually controls the immediate pathophysiological endpoint of the drug target in addition to measuring more distal endpoints. That would inform scientists about the role of that part of the pathophysiology in the course of disease. For example, trials of COX inhibitors measured the drugs’ effects on cognitive and overall clinical endpoints, but did not show that the drugs actually controlled inflammation by tracking CSF cytokine levels. This left a large gap between the treatment and the endpoint and made it impossible to learn much from a failed trial. If the trials had shown that inflammatory markers were indeed down yet disease progressed unchanged, researchers could have ruled out inflammation as a target (at least for the stage of disease tested in the trial).
Previously, few instruments to test intermediate biomarkers related to a candidate drug were available. Company scientists tended to have a candidate drug but no tools to show it enters the brain, hits its target, and has an immediate effect on it. In effect, their trials tested only the molecule, not the hypothesis about its role in disease. That is slowly changing, Lieberburg and May noted. For example, Elan’s ongoing phase 2 trial 9) of passive immunization with an Aβ antibody is using Pittsburgh Compound B (PIB) and glucose PET to test pathophysiologic markers broadly, and Lilly’s γ-secretase inhibitor trials assess Aβ levels in CSF and plasma along with cognitive measures.
Industry Experience with Amyloid Biomarkers
Patrick May described the story of how Eli Lilly and Co. has used biomarkers as tools in its preclinical and early clinic program of γ-secretase inhibitors. In short, the lessons there are that animal models are indispensable for target validation and for assessing clinically relevant biomarkers. Preclinical biomarkers can help the researcher prepare clinical trials and get a sense of what to expect when the drug enters people; however, one should not rely on one animal model alone but integrate data from several different species.
APP proteolysis offers several potential biomarkers, he noted. The secreted fragment sAPPβ could report on β-secretase activity, while the Aβ peptide is the most immediate biomarker for γ-secretase activity. (The other cleavage product, AICD, occurs in amounts too small to be easily traceable.)
A bona fide biomarker that one tracks in the clinic in accessible tissues (i.e., blood, urine, saliva) differs from a preclinical biomarker one can track more invasively to ensure the candidate drug really acts on the intended target. Lilly uses biomarkers to validate the target in vivo and to assess the pharmacodynamic effects of γ-secretase inhibitors in a variety of animal models, first in line being the PDAPP717 mouse originally characterized by scientists at Athena Neurosciences, now a subsidiary of Elan. As it ages, this animal model of Aβ amyloidosis mimics some of the pathologic hallmarks of AD.
The first use of animal models in Lilly’s program lies in target validation. For that purpose, the scientist needs to show that the candidate drug affects the pharmacodynamic biomarker in a dose-dependent way. For example, rising doses of experimental γ-secretase inhibitors increasingly lower levels of hippocampal Aβ in young PDAPP mice. A range of compounds does this, but Lilly scientists have selected one named LY450139 to advance into the clinic. Another aspect of target validation is to ensure that the effect one measures relates in its size and timing to drug exposure. This requires measuring at what time the inhibitor achieves peak concentration before it decays, and relating that to the size and time course of change in its biomarker, that is, hippocampal Aβ.
Lilly’s second use of animal models lies in defining clinically relevant biomarkers. This has been a challenge with CNS drugs because such clinically relevant biomarkers require easy access to tissue, and one cannot assess biochemically whether an experimental compound changes Aβ in human brain. Researchers make inferences about brain Aβ from sampling accessible tissues, but for that to work, they must first understand the peptide’s trafficking from brain through CSF to plasma and its subsequent degradation in the liver. Research on antibodies and chaperones is beginning to do that (see, e.g., Deane et al., 2005; Cirrito et al., 2005). Animal models are indispensable for correlating changes in a clinically relevant tissue with a desired pharmacological response in clinically intractable brain tissue, May said. Not all animals are available for this kind of pharmacodynamic work. It is routinely done with mice, but in dogs and primates, the necessary brain biopsies are usually reserved for the end toxicology studies, May said.
Pharmacodynamic studies in PDAPP with LY450139 showed that a transient drop in plasma Aβ 24 hours after injection correlated with a drop of Aβ in hippocampus, cortex, and CSF, May said. The mouse models established that plasma Aβ can indeed report on changes of Aβ in the central nervous system. At the same time, one cannot simply extrapolate from a transgenic model, May cautioned. This is because non-transgenic mice showed a more complex plasma Aβ response to LY450139, probably because their Aβ contribution to plasma is not driven entirely by a transgene expressed in the brain.
For this reason, the Lilly scientists moved their translational biomarker studies of LY450139 efficacy into the beagle dog, a non-transgenic model large enough to allow repeated drawing of fluid sample big enough for a detailed analysis. Pharmacokinetics and pharmacodynamic studies in this species also showed that plasma Aβ acutely dropped at the same time that the secretase inhibitor reached its maximal concentration in plasma, but then Aβ rebounded, much like it had done in wild-type mice. This held true for a single dose or for a six-month treatment with daily inhibitor oral doses. The decrease in plasma Aβ correlated with significant reductions in CSF Aβ in the dogs. Like in wild-type mice, Aβ levels in plasma showed complex changes over time, but in CSF they did not.
These translational biomarker data prepared the ground for first forays into the clinic. In single-dose and 2-week safety trials in healthy volunteers, plasma Aβ dropped robustly for 6 to 8 hours after LY450139 injection and then rebounded to baseline. The dose-dependence and the pattern of the response mimicked exactly that seen in dog, May said, showing that careful biomarker studies in animal models allow the scientist to predict what to expect in humans. The correlation appeared to break down, however, where it mattered most: Despite the clear drop in plasma, human CSF Aβ levels did not budge significantly. A phase 1b study of 60 people with mild to moderate AD who received placebo or LY450139 once a day for 6 weeks showed, again, that plasma Aβ went down as predicted, but CSF Aβ did not change robustly. In trying to understand this disappointing finding, the scientists discovered that CSF Aβ values varied greatly between subjects, and even across time in a given person. Aβ concentrations swung wildly between 4,000 and 12,000 picograms per milliliter, making it difficult to ascertain a definitive drug effect in CSF. It is unclear if this variability is part of Aβ’s biology—for example, because its levels vary with excitatory activity—or if it reflects the biophysics of Aβ, that is, its stickiness and tendency to aggregate. “Fifteen years ago when we started this program, we called Aβ the peptide from hell. Now after all this intense research, we still think it is the peptide from hell,” quipped May.
Taken together, the translational and clinical studies have shown that this γ-secretase inhibitor appears to be safe and able to reduce plasma Aβ, but an obvious decrease in CSF Aβ remains elusive and the company has not launched a phase 2 trial yet. The biomarker studies in mice and dogs helped set the drug dose and helped predict how Aβ would change in humans, but they have not answered the question of whether LY450139 can be a drug one day, May said. (For a study published last month on how Lilly’s competitor Merck, Sharp, and Dohm tested a new γ-secretase inhibitor of their own in rat brain versus CSF, see Best et al., 2006.)
Lilly's competitor Elan has had its own share of trouble from preclinical mouse studies, most famously when they failed to predict the side effect that hobbled the phase 2 trial of its first-generation, active Aβ immunotherapy AN-1792. Like May, Peter Seubert also conceded that researchers have been humbled by the difficulty they encountered in developing a biomarker for AD, a disease whose pathology is so glaring that its major features—plaques and tangles—have been known for a hundred years.
Seubert reviewed preclinical research leading up to the company’s clinical trials of AN-1792, which ended dosing prematurely when 18 (or 6 percent of) patients developed meningoencephalitis, though their follow-up continues. The Alzheimer Research Forum has covered this effort extensively (see, e.g., ARF conference story; ARF related news story; ARF news story; ARF Live Discussion), and this report therefore presents only points specific to biomarker use in this research program.
For one, the meningoencephalitis prompted a follow-up study relevant to biomarker development. Margot O’Toole and colleagues at Elan’s partner Wyeth Research in Cambridge, Massachusetts, compared participants’ blood samples in search of an immunological gene expression fingerprint that could potentially serve to screen prospective patients in a future trial. Ideally, one would want to exclude people prone to developing T cell-driven inflammation and include those who are likely to mount a desirable B cell-mediated immune response. Results suggested that combinations of gene expression patterns could potentially identify such subjects; however, this is a tentative conclusion because the authors had only five encephalitis cases available for the analysis (O’Toole et al., 2005).
Seubert then turned to cognitive outcomes. Fifty-nine people produced antibody titers above a predetermined threshold in response to the one to three shots of AN-1792 they received. Comparing them to the roughly 300 non-responders, the researchers found no difference in most clinical measures, but they did see a small but significant, titer-related effect in the Wechsler verbal-delayed memory test. They also saw an effect in the composite Z-scores of the memory-related elements of the Neuropsychological Test Battery (NTB).
The AN-1792 trial used CSF markers in a subset of patients. CSF Aβ42 showed no clear change, but total tau levels, which are typically elevated in AD cases, went down significantly in the responders (Gilman et al., 2005). “I take that as a very encouraging sign, that a biomarker (tau) of presumed neurodegenerative origin and distinct from the amyloid target was reduced,” Seubert said. The researchers did not measure phospho-tau in the AN-1792 trial but are considering it in current ones, Seubert said.
The trial also used serial brain volumetric MRI imaging. (It did not include PIB imaging, but a second-generation trial does, Seubert noted.) The MRI biomarker study accompanying the AN-1792 trial lobbed a surprise at the field when it turned out that responders saw their brain volume shrink more than the non-responders (Fox et al., 2005). This was counterintuitive because numerous studies had established that the brain and hippocampus shrink with progressing AD. Beyond speculation about the reasons for this result—loss of amyloid and gliosis, fluid shifts—follow-up data and further analysis are not yet available. It is unclear at present what this finding means for the future of MRI volumetry as a biomarker in AD diagnosis as opposed to one for treatment monitoring. Data keep coming in to suggest it may be useful for the former (e.g., den Heijer et al., 2006) perhaps more than the latter. Meanwhile, the finding has raised questions over regulatory demands that this biomarker be included in pivotal AD trials. Seubert would not say whether Elan still uses volumetric MRI in ongoing trials, or whether it is predicting more or less brain shrinkage.
Postmortem studies of brains of trial participants who have since died confirmed that the vaccine removed amyloid deposition in swaths of brain parenchyma. Activated microglia appeared to engulf this form of amyloid, but its cousin deposited around blood vessels stayed in place, as did neurofibrillary tangles.
In summary, the trial responders showed changes in these biomarkers: Their parenchymal amyloid burden and their CSF tau decreased, as did their brain volume. Their Wechsler verbal-delayed memory improved, as did NTB memory component Z-scores. The trial implies, then, that immunotherapy could treat processes related to amyloidosis as well as to tau, Seubert said. Consequently, biomarkers assessing both classic pathologies will be useful in the development of newer forms of this therapeutic approach (see Drugs in Clinical Trials ).—Gabrielle Strobel.
For introduction, see part 1 of this series.