Here’s a new way to pin down causal variants from GWAS: proteome-wide association studies, aka PWAS. By correlating genetic risk loci with fluctuations in the amount of protein product, scientists can narrow in on genes that are likely to affect brain function. In the January 28 Nature Genetics, researchers led by Thomas Wingo at Emory University, Atlanta, describe how this approach yielded 11 potential Alzheimer’s genes. Only one had been previously linked to AD. Because this method adds another layer of information, it can fish out associations that were too weak to reach genome-wide significance, Wingo noted. “It allows us to resolve the genetic signal, and identify worthwhile targets for mechanistic studies,” he told Alzforum.

  • Proteome-wide association helps parse GWAS data for likely hits.
  • The method turned up 10 new candidate Alzheimer’s genes.
  • They control protein abundance in several ways.

Other groups are pursuing similar approaches. In a preprint on medRxiv, researchers led by Carlos Cruchaga at Washington University in St. Louis present a broad survey of genetic variants that affect protein levels in brain, cerebrospinal fluid, and plasma. The authors then linked some of these protein quantitative trait loci (pQTLs) to altered risk for several neurodegenerative diseases. The method could help find new biomarkers and therapeutic targets, Cruchaga said.

John Hardy, University College London, agreed these methods have promise, noting that biological information such as protein abundance can allow researchers to “cheat” the multiple testing correction normally required in GWAS. “This does work, but it also means that the results are a little less certain because of the more complex hypotheses being tested,” he wrote to Alzforum (full comment below).

Baker’s Dozen. This Manhattan plot of the genome correlates Alzheimer’s disease with levels (y axis) of 1,475 proteins (colored by chromosome). The correlation reached statistical significance (red line) for 13 genes. [Courtesy of Wingo et al., Nature Genetics.]

The limitations of GWAS are well known. They link genomic loci to disease, but by themselves cannot pinpoint the causal variants, or even the genes involved. Researchers have turned to innovative methods to find these genes, for example by bringing in transcriptomic or epigenomic data (Apr 2019 news; Nov 2019 news; Oct 2020 news). 

Wingo and colleagues focused on the proteome. They had previously used a proteomic approach to uncover common roles for oligodendrocytes in atherosclerosis and AD (May 2020 news). 

In the present study, first author Aliza Wingo began by searching for proteins whose abundance was genetically controlled. She used mass spectrometry to identify and measure proteins in 376 dorsolateral prefrontal cortex (DLPFC) samples from the Religious Orders Study-Memory and Aging Project (ROSMAP). Out of 8,356 proteins, she found 1,475 whose abundance was linked to inheritance of particular SNPs. Next, the authors asked whether these SNPs associated with AD. For these 1,475 loci, the researchers integrated their protein abundance findings with a GWAS dataset comprising 71,880 cases and 383,378 controls (Jansen et al., 2019). This analysis related 13 of the 1,475 proteins to AD, with protein levels that varied with disease diagnosis (see image above). Notably, most of the genes found this way came from regions that had fallen short of genome-wide significance in the original GWAS.

In an independent validation set of 152 DLPFC samples, the newly found associations mostly held. Three of the 13 proteins could not be measured in this smaller PWAS, but the other 10 again turned up as appearing in genetically modulated amounts that were linked to the person’s AD diagnosis.

Do these protein fluctuations themselves contribute to AD pathogenesis, or are the associated genes merely markers for the true causal factors that lie somewhere else? To get at this question, the authors ran two different analyses. First, they examined whether the pQTL and AD GWAS risk variants co-localized at each genetic locus. That would imply the same variant is responsible for both associations. For nine of the 13 genes, this was the case. All of these nine pQTLs were cis, meaning they lay near the gene for the protein and could directly regulate its expression. The second analysis used Mendelian randomization, i.e., the researchers gauged whether a given genetic variant foretold protein abundance and AD risk equally. This would suggest that the protein level mediated the disease risk. Again, nine genes met this test, though not the same nine as in the first test.

Altogether, seven of the 13 genes passed both causality tests, two met only the co-localization criteria, and two met only the Mendelian randomization test. Two genes, EPHX2 and PVR, failed both and were kicked off the island, as it were. This resulted in a winnowed list of 11 genes with some evidence for causality. As an additional check, the authors adjusted the findings for APOE genotype. All 11 genes were still significant, indicating they operated independently of this strongest AD risk factor.

So what are these 11 genes? Some fell into pathways previously implicated in Alzheimer’s, such as vesicle trafficking and inflammation. These included syntaxin 4, a SNARE protein that helps dock vesicles at synapses, DOC2A, which regulates vesicle fusion and neurotransmitter release, and three proteins that participate in intracellular trafficking: syntaxin 6, SNX32, and ICA1L. Three other genes, ACE, cathepsin H, and CARHSP1, play a role in immune function. Other associations are less clear. LACTB is a mitochondrial protein, RTFDC1 takes part in DNA replication, and PLEKHA1 mediates transmembrane signaling.

Of these 11 genes, only ACE had previously been linked to influencing both protein level and AD (e.g., Oct 2020 news). However, ICA1L has been tied to amyotrophic lateral sclerosis, and syntaxin 6 to progressive supranuclear palsy (Jun 2011 news). 

In future studies, the authors will test these genes in model systems to explore how they might figure in disease. They will also map each genetic locus more finely to find the exact causal variants. In many of the loci, multiple SNPs associate with protein level and disease, leaving open the question of which one is actually responsible.

Another unresolved question is whether these variants act on gene expression, or if they influence how much of the protein is there in some other way, for example via its stability or localization. Wingo and colleagues found evidence for transcript changes in only five of the 11 genes, hinting that mechanisms other than expression might be at work.

For their part, Cruchaga’s team focused on potential regulatory mechanisms. Before considering their role in disease, first author Chengran Yang scoured the human genome for pQTLs. Yang isolated a suite of 1,305 proteins from several human tissues by luring them with aptamers—single-stranded oligonucleotides that bind to proteins with high specificity. The authors analyzed 458 parietal lobe samples, as well as cerebrospinal fluid from 971 and plasma from 636 donors. All came from WashU studies. For each type of sample, the authors correlated protein levels with more than 14 million SNPs to find pQTLs. This turned up 32 loci that associated with protein level in brain, 274 with CSF proteins, and 127 with plasma proteins. The researchers checked additional proteome datasets, which yielded a high degree of concurrence, identifying more than 90 percent of the same pQTLs. More than half were specific to a particular tissue. “We need to study multiple tissue and cell types,” Cruchaga noted.

Unlike Wingo and colleagues’ approach, which focused on cis variants, this methodology unearthed both cis and trans pQTLs. Cis-pQTLs were more likely than trans to be shared between tissues. Many lay in noncoding regions, and more tightly associated with protein level the closer they were to the transcription start site, in keeping with modulation of gene expression. Others were coding variants, often turning up at protein cleavage sites or secretory signal regions, hinting at post-translational regulation of protein levels. Around 20 to 25 percent of the pQTLs were trans, meaning they lie far away from the gene or genes they regulate. This implies they act indirectly through other proteins, such as transcription factors. Altogether, pQTLs seem to include a wide variety of mechanisms for controlling protein levels that go far beyond mere expression changes.

Next, Cruchaga and colleagues integrated their findings with GWAS data, applying Mendelian randomization in their case to find associations of pQTLs with AD, Parkinson’s disease, ALS, frontotemporal dementia, and stroke. This turned up several links for each disease. In the case of AD, the authors identified a strong cis-pQTL that affected CSF and plasma levels of the microglial receptor CD33. In PD, the data resolved a GWAS region that contained multiple genes: TMEM175, GAK, DGKQ, CPLX1, and IDUA. The pQTL data suggested IDUA, a lysosomal protein that degrades glycosaminoglycans, as the causal variant. The pQTL findings also fingered carbonic anhydrase IV as an ALS gene and E-selectin as a stroke gene. Some genes were linked to multiple disorders. For example, IL-1FG and SLAF5 in CSF associated with both AD and PD, and plasma MICA with AD and FTD.

Although proteome data may help find new genes, Cruchaga believes it will not uncover all disease links. For one thing, current measurement methods cannot capture all proteins, hence scientists are missing some associations. For another, genes can exert their effects through other types of molecules, too. For example, APOE acts via lipids.

Cruchaga believes future studies should expand to encompass lipidomics, metabolomics, epigenomics, and noncoding RNA. “We have the technology now to start combining multiple omic layers to recover some of the GWAS signals that are otherwise impossible to identify,” he told Alzforum.—Madolyn Bowman Rogers


  1. The beauty of standard genome-wide association studies is the simplicity of the analysis pipeline … a high (p) value is required for Bonnferroni correction, but, once that is past, the result is credible and reproducible. However, the very high correction (declaring results only with p>1x10-7) means many signals are probably lost.

    To try to find these lost signals, Wingo and colleagues (and many others, including us) have tried to use biological information essentially to “cheat” the multiple testing correction, e.g., by looking only at PU.1/SPI1-regulated genes or only at amyloid responsive genes. This does “work,” but it also means that the results are a little less certain because of the more complex hypotheses being tested. The GWAs purists are generally not impressed by these shortcuts, but clearly they can be valuable. 

    This is a long way of saying caveat emptor for this paper. However, a few points are worth making. First, STX6 is a PSP (Höglinger et al., 2011) and a prion disease GWAS hit (Jones et al., 2020). Second, several of the genes reported had previously been shown by us, and I suspect others, to be amyloid-responsive in transgenic mice (Salih et al., 2019). So, I think there is some wheat in these data, but there may also be some chaff.


    . Identification of common variants influencing risk of the tauopathy progressive supranuclear palsy. Nat Genet. 2011 Jul;43(7):699-705. PubMed.

    . Identification of novel risk loci and causal insights for sporadic Creutzfeldt-Jakob disease: a genome-wide association study. Lancet Neurol. 2020 Oct;19(10):840-848. Epub 2020 Sep 16 PubMed.

  2. How AD genetic risk variants eventually lead to dementia is a difficult but important topic of research.

    Wingo and colleagues took a PWAS approach to study relationships between AD genetic risk variants and abundances of 1,475 proteins quantified in the dorsolateral prefrontal cortex from a total of 528 individuals with or without AD. They found that only 10 (<1 percent) genes were associated with cis-protein levels in both discovery and replication cohorts. It is surprising that none of these 10 genes were amongst the GWAS hits most strongly associated with AD previously, even though those AD-case control weights were used in the PWAS (Jansen et al., 2019). 

    Still, one of the hits, ACE, was reported to be associated with AD by another GWAS study (Kunkle et al., 2019). The cis-results for ACE and CTSH were also reported by another study that performed PWAS with cerebrospinal fluid proteins using Washington University’s ONTIME GWAS database, but none of the other hits have been linked to AD (Yang et al., 2020).

    Conversely, the strongest cis-hits in ONTIME, are not amongst those presented by Wingo and colleagues, e.g., SIGLEC9, ICAM1, LPR, IL1RL1, and CHI3L1. Possibly, regressing out clinical status from the proteomic data makes it difficult to find relationships between AD genetic risk variants and abundances in their proteomic counterparts. Alternatively, the proteomic technique used may not adequately capture protein alterations caused by genetic variants, and/or there may be differences for proteins quantified in tissue and CSF. For example, using a targeted technique, Spellman et al. showed clear relationships between the presence of an APOE e4 allele and the detectability of APOE e4 peptide concentrations in CSF (Spellman et al., 2015). 


    . Genome-wide meta-analysis identifies new loci and functional pathways influencing Alzheimer's disease risk. Nat Genet. 2019 Mar;51(3):404-413. Epub 2019 Jan 7 PubMed.

    . Genetic meta-analysis of diagnosed Alzheimer's disease identifies new risk loci and implicates Aβ, tau, immunity and lipid processing. Nat Genet. 2019 Mar;51(3):414-430. Epub 2019 Feb 28 PubMed.

    . Development and evaluation of a multiplexed mass spectrometry based assay for measuring candidate peptide biomarkers in Alzheimer's Disease Neuroimaging Initiative (ADNI) CSF. Proteomics Clin Appl. 2015 Aug;9(7-8):715-31. Epub 2015 Apr 24 PubMed.

  3. Although genome-wide association studies (GWAS) have been pivotal at increasing our understanding of the genetic basis of AD, it has remained challenging to understand what the effect is of an implicated variant (i.e., which gene it affects). Novel approaches integrating GWAS studies with transcriptome (TWAS) or proteome (PWAS) data is a promising way forward to pinpoint affected genes (Gusev et al., 2016; Suhre et al., 2021). Wingo et al. apply such methods to the dorsolateral cortex of AD patients and controls and were able to shed new light on the functional effects of genetic variants associated with Alzheimer’s risk.

    For example, genetic variants in ACE have been linked to AD on numerous occasions, including a recent GWAS (Kunkle et al., 2019), but given the high gene density in this region it remained unclear which gene was most likely affected. The strength of this paper lies in the numerous methods and additional datasets they have employed to infer the likelihood that variants in the ACE locus are also in fact affecting ACE protein expression.

    The authors also highlight that eight of the 11 identified causal genes did not meet the stringent p-value cut-offs used in GWAS, but still had suggestive AD associations (p-values of 5.3x10-5 to 1.9x10-7). This supports our view that there is still much to learn with regard to the genetic underpinning of AD, as for example also shown by polygenic risk assessment (Escott-Price, 2015; Escott-Price et al., 2017). These types of studies enable the identification, while we await more powerful AD GWAS studies (Sierksma et al., 2020), of key molecular players and pathways that lie at the heart of AD pathogenesis.

    Although follow-up work delving deeper into the function of the identified proteins is required, it is already very interesting to see more genes converging onto the endolysosomal/phagocytic pathway, including CTSH, SNX32, STX4, STX6, and PLEKHA1 (Sierksma et al., 2020; Podleśny-Drabiniok et al., 2020). According to the authors, by leveraging previous single nuclei data from (Mathys et al., 2019), only CTSH showed enriched expression in microglia, but it remains tempting to speculate about the potential role of the other proteins in microglial phagocytosis.


    . Common polygenic variation enhances risk prediction for Alzheimer's disease. Brain. 2015 Dec;138(Pt 12):3673-84. Epub 2015 Oct 21 PubMed.

    . Polygenic risk score analysis of pathologically confirmed Alzheimer disease. Ann Neurol. 2017 Aug;82(2):311-314. Epub 2017 Aug 9 PubMed.

    . Integrative approaches for large-scale transcriptome-wide association studies. Nat Genet. 2016 Mar;48(3):245-52. Epub 2016 Feb 8 PubMed.

    . Genetic meta-analysis of diagnosed Alzheimer's disease identifies new risk loci and implicates Aβ, tau, immunity and lipid processing. Nat Genet. 2019 Mar;51(3):414-430. Epub 2019 Feb 28 PubMed.

    . Single-cell transcriptomic analysis of Alzheimer's disease. Nature. 2019 Jun;570(7761):332-337. Epub 2019 May 1 PubMed.

    . Microglial Phagocytosis: A Disease-Associated Process Emerging from Alzheimer's Disease Genetics. Trends Neurosci. 2020 Dec;43(12):965-979. Epub 2020 Oct 27 PubMed.

    . Translating genetic risk of Alzheimer's disease into mechanistic insight and drug targets. Science. 2020 Oct 2;370(6512):61-66. PubMed.

    . Genetics meets proteomics: perspectives for large population-based studies. Nat Rev Genet. 2021 Jan;22(1):19-37. Epub 2020 Aug 28 PubMed.

Make a Comment

To make a comment you must login or register.


News Citations

  1. Expression, Expression, Expression—Time to Get on Board with eQTLs
  2. Cell-Specific Enhancer Atlas Centers AD Risk in Microglia. Again.
  3. Epigenomic Roadmap Points to Causal Genes
  4. Massive Proteomics Studies Peg Glial Metabolism, Myelination, to AD
  5. New ACE Variant Speeds Neurodegeneration in Alzheimer’s Mice
  6. GWAS Fingers Tau and Other Genes for Parkinsonian Tauopathy

Paper Citations

  1. . Genome-wide meta-analysis identifies new loci and functional pathways influencing Alzheimer's disease risk. Nat Genet. 2019 Mar;51(3):404-413. Epub 2019 Jan 7 PubMed.

Further Reading

Primary Papers

  1. . Integrating human brain proteomes with genome-wide association data implicates new proteins in Alzheimer's disease pathogenesis. Nat Genet. 2021 Feb;53(2):143-146. Epub 2021 Jan 28 PubMed.
  2. . Genomic and multi-tissue proteomic integration for understanding the biology of disease and other complex traits. MedRxiv. 2020 Jun 26. medRxiv.