Here’s a new way to pin down causal variants from GWAS: proteome-wide association studies, aka PWAS. By correlating genetic risk loci with fluctuations in the amount of protein product, scientists can narrow in on genes that are likely to affect brain function. In the January 28 Nature Genetics, researchers led by Thomas Wingo at Emory University, Atlanta, describe how this approach yielded 11 potential Alzheimer’s genes. Only one had been previously linked to AD. Because this method adds another layer of information, it can fish out associations that were too weak to reach genome-wide significance, Wingo noted. “It allows us to resolve the genetic signal, and identify worthwhile targets for mechanistic studies,” he told Alzforum.
- Proteome-wide association helps parse GWAS data for likely hits.
- The method turned up 10 new candidate Alzheimer’s genes.
- They control protein abundance in several ways.
Other groups are pursuing similar approaches. In a preprint on medRxiv, researchers led by Carlos Cruchaga at Washington University in St. Louis present a broad survey of genetic variants that affect protein levels in brain, cerebrospinal fluid, and plasma. The authors then linked some of these protein quantitative trait loci (pQTLs) to altered risk for several neurodegenerative diseases. The method could help find new biomarkers and therapeutic targets, Cruchaga said.
John Hardy, University College London, agreed these methods have promise, noting that biological information such as protein abundance can allow researchers to “cheat” the multiple testing correction normally required in GWAS. “This does work, but it also means that the results are a little less certain because of the more complex hypotheses being tested,” he wrote to Alzforum (full comment below).
Baker’s Dozen. This Manhattan plot of the genome correlates Alzheimer’s disease with levels (y axis) of 1,475 proteins (colored by chromosome). The correlation reached statistical significance (red line) for 13 genes. [Courtesy of Wingo et al., Nature Genetics.]
The limitations of GWAS are well known. They link genomic loci to disease, but by themselves cannot pinpoint the causal variants, or even the genes involved. Researchers have turned to innovative methods to find these genes, for example by bringing in transcriptomic or epigenomic data (Apr 2019 news; Nov 2019 news; Oct 2020 news).
Wingo and colleagues focused on the proteome. They had previously used a proteomic approach to uncover common roles for oligodendrocytes in atherosclerosis and AD (May 2020 news).
In the present study, first author Aliza Wingo began by searching for proteins whose abundance was genetically controlled. She used mass spectrometry to identify and measure proteins in 376 dorsolateral prefrontal cortex (DLPFC) samples from the Religious Orders Study-Memory and Aging Project (ROSMAP). Out of 8,356 proteins, she found 1,475 whose abundance was linked to inheritance of particular SNPs. Next, the authors asked whether these SNPs associated with AD. For these 1,475 loci, the researchers integrated their protein abundance findings with a GWAS dataset comprising 71,880 cases and 383,378 controls (Jansen et al., 2019). This analysis related 13 of the 1,475 proteins to AD, with protein levels that varied with disease diagnosis (see image above). Notably, most of the genes found this way came from regions that had fallen short of genome-wide significance in the original GWAS.
In an independent validation set of 152 DLPFC samples, the newly found associations mostly held. Three of the 13 proteins could not be measured in this smaller PWAS, but the other 10 again turned up as appearing in genetically modulated amounts that were linked to the person’s AD diagnosis.
Do these protein fluctuations themselves contribute to AD pathogenesis, or are the associated genes merely markers for the true causal factors that lie somewhere else? To get at this question, the authors ran two different analyses. First, they examined whether the pQTL and AD GWAS risk variants co-localized at each genetic locus. That would imply the same variant is responsible for both associations. For nine of the 13 genes, this was the case. All of these nine pQTLs were cis, meaning they lay near the gene for the protein and could directly regulate its expression. The second analysis used Mendelian randomization, i.e., the researchers gauged whether a given genetic variant foretold protein abundance and AD risk equally. This would suggest that the protein level mediated the disease risk. Again, nine genes met this test, though not the same nine as in the first test.
Altogether, seven of the 13 genes passed both causality tests, two met only the co-localization criteria, and two met only the Mendelian randomization test. Two genes, EPHX2 and PVR, failed both and were kicked off the island, as it were. This resulted in a winnowed list of 11 genes with some evidence for causality. As an additional check, the authors adjusted the findings for APOE genotype. All 11 genes were still significant, indicating they operated independently of this strongest AD risk factor.
So what are these 11 genes? Some fell into pathways previously implicated in Alzheimer’s, such as vesicle trafficking and inflammation. These included syntaxin 4, a SNARE protein that helps dock vesicles at synapses, DOC2A, which regulates vesicle fusion and neurotransmitter release, and three proteins that participate in intracellular trafficking: syntaxin 6, SNX32, and ICA1L. Three other genes, ACE, cathepsin H, and CARHSP1, play a role in immune function. Other associations are less clear. LACTB is a mitochondrial protein, RTFDC1 takes part in DNA replication, and PLEKHA1 mediates transmembrane signaling.
Of these 11 genes, only ACE had previously been linked to influencing both protein level and AD (e.g., Oct 2020 news). However, ICA1L has been tied to amyotrophic lateral sclerosis, and syntaxin 6 to progressive supranuclear palsy (Jun 2011 news).
In future studies, the authors will test these genes in model systems to explore how they might figure in disease. They will also map each genetic locus more finely to find the exact causal variants. In many of the loci, multiple SNPs associate with protein level and disease, leaving open the question of which one is actually responsible.
Another unresolved question is whether these variants act on gene expression, or if they influence how much of the protein is there in some other way, for example via its stability or localization. Wingo and colleagues found evidence for transcript changes in only five of the 11 genes, hinting that mechanisms other than expression might be at work.
For their part, Cruchaga’s team focused on potential regulatory mechanisms. Before considering their role in disease, first author Chengran Yang scoured the human genome for pQTLs. Yang isolated a suite of 1,305 proteins from several human tissues by luring them with aptamers—single-stranded oligonucleotides that bind to proteins with high specificity. The authors analyzed 458 parietal lobe samples, as well as cerebrospinal fluid from 971 and plasma from 636 donors. All came from WashU studies. For each type of sample, the authors correlated protein levels with more than 14 million SNPs to find pQTLs. This turned up 32 loci that associated with protein level in brain, 274 with CSF proteins, and 127 with plasma proteins. The researchers checked additional proteome datasets, which yielded a high degree of concurrence, identifying more than 90 percent of the same pQTLs. More than half were specific to a particular tissue. “We need to study multiple tissue and cell types,” Cruchaga noted.
Unlike Wingo and colleagues’ approach, which focused on cis variants, this methodology unearthed both cis and trans pQTLs. Cis-pQTLs were more likely than trans to be shared between tissues. Many lay in noncoding regions, and more tightly associated with protein level the closer they were to the transcription start site, in keeping with modulation of gene expression. Others were coding variants, often turning up at protein cleavage sites or secretory signal regions, hinting at post-translational regulation of protein levels. Around 20 to 25 percent of the pQTLs were trans, meaning they lie far away from the gene or genes they regulate. This implies they act indirectly through other proteins, such as transcription factors. Altogether, pQTLs seem to include a wide variety of mechanisms for controlling protein levels that go far beyond mere expression changes.
Next, Cruchaga and colleagues integrated their findings with GWAS data, applying Mendelian randomization in their case to find associations of pQTLs with AD, Parkinson’s disease, ALS, frontotemporal dementia, and stroke. This turned up several links for each disease. In the case of AD, the authors identified a strong cis-pQTL that affected CSF and plasma levels of the microglial receptor CD33. In PD, the data resolved a GWAS region that contained multiple genes: TMEM175, GAK, DGKQ, CPLX1, and IDUA. The pQTL data suggested IDUA, a lysosomal protein that degrades glycosaminoglycans, as the causal variant. The pQTL findings also fingered carbonic anhydrase IV as an ALS gene and E-selectin as a stroke gene. Some genes were linked to multiple disorders. For example, IL-1FG and SLAF5 in CSF associated with both AD and PD, and plasma MICA with AD and FTD.
Although proteome data may help find new genes, Cruchaga believes it will not uncover all disease links. For one thing, current measurement methods cannot capture all proteins, hence scientists are missing some associations. For another, genes can exert their effects through other types of molecules, too. For example, APOE acts via lipids.
Cruchaga believes future studies should expand to encompass lipidomics, metabolomics, epigenomics, and noncoding RNA. “We have the technology now to start combining multiple omic layers to recover some of the GWAS signals that are otherwise impossible to identify,” he told Alzforum.—Madolyn Bowman Rogers
- Expression, Expression, Expression—Time to Get on Board with eQTLs
- Cell-Specific Enhancer Atlas Centers AD Risk in Microglia. Again.
- Epigenomic Roadmap Points to Causal Genes
- Massive Proteomics Studies Peg Glial Metabolism, Myelination, to AD
- New ACE Variant Speeds Neurodegeneration in Alzheimer’s Mice
- GWAS Fingers Tau and Other Genes for Parkinsonian Tauopathy
- Jansen IE, Savage JE, Watanabe K, Bryois J, Williams DM, Steinberg S, Sealock J, Karlsson IK, Hägg S, Athanasiu L, Voyle N, Proitsi P, Witoelar A, Stringer S, Aarsland D, Almdahl IS, Andersen F, Bergh S, Bettella F, Bjornsson S, Brækhus A, Bråthen G, de Leeuw C, Desikan RS, Djurovic S, Dumitrescu L, Fladby T, Hohman TJ, Jonsson PV, Kiddle SJ, Rongve A, Saltvedt I, Sando SB, Selbæk G, Shoai M, Skene NG, Snaedal J, Stordal E, Ulstein ID, Wang Y, White LR, Hardy J, Hjerling-Leffler J, Sullivan PF, van der Flier WM, Dobson R, Davis LK, Stefansson H, Stefansson K, Pedersen NL, Ripke S, Andreassen OA, Posthuma D. Genome-wide meta-analysis identifies new loci and functional pathways influencing Alzheimer's disease risk. Nat Genet. 2019 Mar;51(3):404-413. Epub 2019 Jan 7 PubMed.
- Doubling Down on Sequencing Serves up More Alzheimer’s Genes
- Geneticists Seek Out Rare Contributors to Alzheimer’s
- Rogue Gene Networks Track with Neurodegeneration Across Diseases
- A Proteomics Dive into Cause of Frontotemporal Dementia
- Plasma Proteomics Study Hints at New Player in Alzheimer’s
- Protein Screen Links Mitochondrial Regulator to Alzheimer’s Disease
- Proteomics Uncovers Potential Markers, Subtypes of Alzheimer’s
- Wingo AP, Liu Y, Gerasimov ES, Gockley J, Logsdon BA, Duong DM, Dammer EB, Robins C, Beach TG, Reiman EM, Epstein MP, De Jager PL, Lah JJ, Bennett DA, Seyfried NT, Levey AI, Wingo TS. Integrating human brain proteomes with genome-wide association data implicates new proteins in Alzheimer's disease pathogenesis. Nat Genet. 2021 Feb;53(2):143-146. Epub 2021 Jan 28 PubMed.
- Yang C, Farias FG, Ibanez L, Sadler B, Fernandez MV, Wang F, Bradley JL, Eiffert B, Bahena JA, Budde JP, Li Z, Dube U, Sung YJ, Mihindukulasuriya KA, Morris JC, Fagan A, Perrin RJ, Benitez B, Rhinn H, Harari O, Cruchaga C. Genomic and multi-tissue proteomic integration for understanding the biology of disease and other complex traits. MedRxiv. 2020 Jun 26. medRxiv.