ApoE2 lowers Alzheimer’s risk; ApoE4 raises it. But what explains the people who buck this trend—i.e., the ApoE2 carriers who get AD, and ApoE4 carriers who stay free of it into old age? In the December 7 Alzheimer’s and Dementia, researchers led by Olivier Lichtarge at Baylor College of Medicine in Houston used evolutionary genetics to investigate this question. The researchers hunted for genetic variants associated with these so-called paradoxical phenotypes. They identified more than 200 genes that may either shield ApoE4 carriers from AD, or render ApoE2 carriers vulnerable to it.

  • Scientists pinpoint 216 genes that counter ApoE2 protection, ApoE4 risk.
  • For some, expression was linked to GWAS variants.
  • Genes involved in synaptic function, lipoprotein regulation, and inflammation.

Many of the genes were differentially expressed in AD brain samples, had close ties to previously reported AD risk genes, and even predicted a person’s disease status. Their products play roles in familiar processes such as synaptic function, inflammation, and protein trafficking. If validated in larger cohorts, these potential modifiers could help stratify cohorts by disease risk, serve as biomarkers, or even be viable therapeutic targets, the authors propose.

Genetic variation accounts for as much as 80 percent of the population burden of AD, yet much of the heritability remains unexplained. While more than 30 common genetic variants—many in noncoding stretches of the genome—have emerged from genome-wide association studies, sequencing studies have plucked out handfuls of rare variants that hold stronger sway over a person’s risk. While ever-larger studies will unearth more variants, scientists have also devised creative ways of leveraging existing data to dig out more genetic paydirt.

Because ApoE remains the strongest genetic AD risk factor, stratifying genetic data based on ApoE genotype has the potential to shine a light on variants previously hidden within the apolipoprotein’s shadow. In 2018 the Alzheimer’s Disease Sequencing Project published the largest whole-exome sequencing study, which identified two new rare variants (Aug 2018 news). The following year, ADSP researchers stratified the data based on ApoE4 status, identifying unique risk variants (Jun 2019 news). 

In a variation on this theme, co-first authors Young Won Kim and Ismael Al-Ramahi and colleagues used the ADSP exome data to fish out genes that influenced disease so much that they counteracted the effects of ApoE genotype. With some 11,000 exomes to choose from, they limited their analysis to those from outliers—that is, 179 ApoE2 carriers who had AD, and 301 ApoE4 carriers who did not.

As evolutionary geneticists, the researchers called on phylogenetic changes over millennia to pick out potentially impactful mutations lurking within the exomes of the two groups. Essentially, this so-called “evolutionary action” approach is a way to gauge the functional impact of an amino acid change within a protein’s coding sequence. EA takes into account a given amino acid’s importance in driving evolution, and also the potential impact of swapping that amino acid with another. The researchers pulled out genes with an unusually high EA load, using some statistical wizardry called differential “imputed deviation in EA load,” aka iDEAL, in the two paradoxical ApoE groups. They fished out 216 iDEAL genes: 148 genes were riddled with potentially pathogenic variants that might have put the ApoE2 carriers at risk, and 68 genes were brimming with potentially beneficial variants that may have fended off disease in the healthy ApoE4 carriers.

For an example of how this approach works, consider one of the hits, i.e. TREM2. Likely pathogenic variants in this gene were significantly more common in ApoE2 carriers who had AD (ApoE2-AD) than they were in healthy controls who carried ApoE4 (ApoE4-HC). The pathogenic changes included the well-known AD risk variant R47H, which was seven times more common in the ApoE2-AD than the ApoE4-HC group.

The EA load calculation also took into account TREM2 rarer variants with a higher impact, such as the T96K variant, as well as a cadre of less-impactful variants. Though many of these variants would have fallen short of statistical significance in a typical GWAS, the EA approach took all of them into account to gauge TREM2’s aggregate mutational burden. The findings suggest that variants that harm TREM2 function could erode disease protection in ApoE2 carriers.

TREM2 variants have been tied to AD risk regardless of ApoE genotype. Could other genes identified with this iDEAL approach also associate with AD independently of ApoE? Perhaps. The researchers found that among ApoE3 homozygotes in the ADSP dataset, potentially protective variants in ApoE4-HC iDEAL genes were enriched in healthy controls compared to cases, and the opposite was true for pathogenic variants in ApoE2-AD iDEAL genes. The genes could therefore influence AD pathogenesis regardless of ApoE, the researchers proposed.

To investigate the wider relevance of iDEAL genes, the researchers looked for them in other AD datasets. Using gene-expression data from the Accelerating Medicines Partnership–AD sequence repository, they found that 75 iDEAL genes were differentially expressed in AD brain samples relative to controls. Referencing GWAS data, they found that iDEAL genes were more closely tied to GWAS hits in protein interaction networks than would be expected by chance. In fact, 25 iDEAL genes interact directly with genes involved in Aβ and tau pathology.

In addition to moving in the same functional circles as GWAS proteins, seven iDEAL genes— GOLGA5, PTBP1, SYTL2, SMARCD2, GAMT, TREM2, and AZU1—landed within 500 kilobases of a GWAS hit, suggesting they could even represent the causal gene behind the disease association.

The researchers also found that 39 of the iDEAL genes interacted with 390 compounds in the Drug Interaction Database, suggesting they could make drug targets. Three genes—ITGA2B, ALDH5A, and HDAC7—interact with two drugs, enoxaparin and valproic acid, that have been tied to lower AD incidence in a population study (Kern et al., 2019). Knocking down or overexpressing 69 iDEAL genes either ameliorated or exacerbated climbing defects in fruit fly models of amyloidosis or neurofibrillary tangles. As flies have no ApoE homolog, this suggests that at least some of these genes alter AD risk independently of the lipoprotein.

Might iDEAL gene variants be used to gauge AD risk? Using a statistical machine-learning approach, the researchers found that they were able to distinguish between ApoE2 carriers with AD and healthy ApoE4 carriers. A subset of just 94 iDEAL gene variants was sufficient to separate the two groups. Moreover, within the larger ADSP dataset, iDEAL variants predicted which ApoE2 carriers would develop AD, and which ApoE4 carriers would remain healthy.

What do these genes do? An analysis of their biological pathways revealed that many are involved in keeping synapses up and running. Axonal projection, protein trafficking, microtubule transport, and dendritic spine pathways featured prominently, as well as ApoE-related functions such as lipoprotein regulation.

The researchers combed through single-cell transcriptomic data from postmortem brain samples, and placed many iDEAL genes within cell-type-specific functional networks (McKenzie et al., 2018, and see below). In neurons, iDEAL genes cropped up in synaptic signaling and plasticity pathways. In microglia, the genes were involved in inflammation, autophagy, and lysosomal function, as well as synaptic pruning. Interestingly, five iDEAL genes in two different cell types—CNTN1 and NCKAP in neurons and NRXN2, ABHD2, and TIA1 in microglia—take part in the same process, i.e. dendritic spine maintenance.

iDEAL Networks. iDEAL genes (green) and genes that directly interact with them (gray) are involved in critical cell-type-specific brain functions. [Courtesy of Kim et al., Alzheimer’s and Dementia, 2020.]

Michael Ewers of Ludwig Maximilian University in Munich called the genetic approaches in the study elegant. “The study revealed a large number of genes harboring either detrimental or protective mutations in ApoE2 and ApoE4 carriers,” he wrote. “These exploratory results will encourage future investigations to follow up these findings in independent exome-sequencing replication cohorts as well as by genome-wide association analyses in larger cohorts. A major question is whether any variants in the depicted genes are modifiers specifically of the effect of ApoE genotype on AD, or are associated with AD risk in general.”

Lichtarge also emphasized that the data will need to be validated in other cohorts, noting that differences in genetic background and even in sequencing protocols can cause variability between studies. He also wants to explore some of the iDEAL genes in human organoid models, and to test therapeutic compounds that cropped up in the drug interaction database.

Adam Naj of the University of Pennsylvania in Philadelphia finds the genes identified in the study interesting because they are in pathways and networks involving AD-susceptibility loci identified in prior GWAS. “Following on these findings, a major question remains: How much do these genes/variants contribute to AD risk in aggregate in larger samples of individuals, and among groups not selected for APOE e4 enrichment and case-control status?” he wrote. “Quantifying their contribution to disease heritability overall may speak to the importance of recruiting sample sets with this kind of enrichment.”—Jessica Shugart


  1. This is an elegant study by Kim and colleagues harnessing the gain in statistical power by virtue of focusing on the genetic variants that modify the risk of ApoE e4 genotype based on exome-sequencing data and using the evolutionary-action approach, i.e., a computational method to estimate the phenotypical impact of mutations.

    The study revealed a large number of genes harboring detrimental or protective mutations in ApoE e2 and ApoE e4 carriers, respectively. These exploratory results will encourage future investigations to follow up these findings in independent exome-sequencing replication cohorts as well as by genome-wide association analyses in larger cohorts.

    A major question is whether any variants in the depicted genes are modifiers, specifically, of the effect of ApoE genotype on AD, or are associated with AD risk in general. A comparison of the results with those of the recent whole-exome-sequencing study, which was also based on data from the Alzheimer’s Disease Sequencing Project, would have been great (Bis et al., 2018). 


    . Whole exome sequencing study identifies novel rare and common Alzheimer's-Associated variants involved in immune response and transcriptional regulation. Mol Psychiatry. 2018 Aug 14; PubMed.

  2. This study capitalizes on subsets of samples deemed “low genetic risk cases” (carriers of the protective APOE ε2 haplotype) and “high genetic risk controls” (carriers of the risk-increasing APOE ε4 haplotype) that underwent whole-exome sequencing (WES) as part of the Alzheimer’s Disease Sequencing Project (ADSP). The purpose in looking at these subsets is to use “extreme” sample sets, i.e., cases without known genetic risk factors and unaffected subjects with known genetic risk factors, to make it easier to try to identify novel coding variants with strong effects on disease risk as new AD susceptibility loci, by biasing the study against “the usual suspects” such as APOE and other known loci. This strategy has the potential to be richly rewarding in identifying new disease susceptibility loci, candidate pathways, and relevant gene networks.

    The genes identified in this study are interesting, as they are found in a number of pathways and networks involving known AD susceptibility loci from prior GWAS, however few prior AD GWAS loci themselves were identified. This may be because, as the authors noted, most GWAS loci fall in noncoding regions while in contrast this study of WES focuses on coding region variation, and also because their APOE-based sampling should have controlled for associations in or near that locus. However, because of the selected nature of a sample, a major question remains about the generalizability of their findings: How much do these genes/variants contribute to AD risk in aggregate in representative samples of the population, and among groups not selected for APOE e4 enrichment and case-control status? Quantifying their contribution to disease heritability overall may speak to the importance of recruiting sample sets with this kind of enrichment.

    There are some concerns about the study design. First, there are sample size limitations inherent in the design and there was no validation in independent samples. Second, the ADSP WES dataset was sequenced at three different sequencing centers using two different sequence capture kits, which may lead to variation in genotype quality. Specifically, it is unclear if the variants examined fell within targeted capture regions in all samples (in both kits), which is a concern because of highly variable but overall reduced genotype calling quality and accuracy among off-target variants, even those in sequence immediately flanking capture regions. ADSP best practices recommend exclusion of these variants, which account for nearly 50 percent of all called variants, and it is unclear whether these lower-quality variants were excluded by the filtering criteria implemented. While this may present a problem, validation in other data is highly encouraged and may still yield support for their findings.

    Finally, acknowledging the wealth of genes identified in this study that are further supported by integrating functional genomics, we have a major question about genomic search strategies like this to consider: Does a larger search space for AD candidate genes/loci help or hurt the hunt for therapies to slow or stop progression of the disease? Among the best examples of successful translations of genetic studies to clinical treatment are the development of lipid-lowering drugs based on studies that identified risk variants in HMGCR and PCSK2. Does this enhanced-AD search space filled with new candidate genes make it easier to identify therapeutic targets likely to have an effect of the disease? Also, with so many potential therapeutic targets emerging, would strategies to prioritize specific genes for further investigation help to make the studies like these more appealing and informative? This is a broad and emerging philosophical debate with which the field will grapple in coming years.

  3. Kim et al. applied an elegant evolution-based variant functional impact weighting method and a novel discordant phenotype subsampling approach to identify genes that modify predicted AD outcomes in APOE-ε4 and ε2 carriers. Their analyses suggest etiological roles for synaptic maintenance in several cell types and the endolysosomal system in microglia. This is consistent with findings from previous analyses by our group and others (Aug 2019 newsDourlen et al., 2019). 

    However, this unorthodox association analysis methodology is not described in detail, rendering it difficult to assess the significance and robustness of the findings. It is unclear what “# of observed variants” refers to (cumulative minor allele count, number of variants), what variant and sample level QC was performed (e.g., minor allele frequency cutoff), and what technical (batch, capture, sequencing center, etc.) or biological (age, sex, population structure) covariates were included in the association analyses. Lack of replication is another issue limiting confidence in the findings. Yet it appears that only about half (5,686) of the 10,000 individuals available in the ADSP Discovery cohort were used in this study, such that the remaining half could be used as an independent replication dataset to strengthen its findings.

    There are other areas of concern. Diagnostic plots (e.g., residuals vs. fitted, normal Q-Q) for the gene discovery linear regressions would aid interpretation but are not shown. Nominal significance is used as a threshold in the permutation analysis that generated the final list of 216 iDEAL genes. Multiple testing correction and thus a much higher number of permutations would be required to constrain the number of false positive findings in this list.

    Additional evidence is provided to support the statistical association results. Differential gene expression of the iDEAL genes in AD vs. control brains, although widely used, is not very convincing for causal involvement of a gene because it could arise from cell-type proportion changes in the degenerating brain tissue or reactive (rather than causal) changes in gene expression programs within cells.

    Moreover, it is unclear how the authors generated the list of AD GWAS candidate genes used in their STRING network analysis that showed significant interconnectivity of iDEAL genes with known AD genes. GWAS are designed to identify loci, not genes, and (with a few exceptions like APOE, TREM2, PLCG2, ABI3, ABCA7, SORL1, SPI1, etc.) the underlying causal gene(s) remain(s) unknown for most AD-associated loci. Interestingly, the Drosophila Aβ/Tau model experiments identified a few promising examples, validating the dAD-ε2 and dAD-ε4 predictions, and it would be worthwhile exploring the effect of these iDEAL genes on disease-relevant phenotypes in vertebrate models.

    Despite these concerns, the authors' use of extreme phenotype sampling to improve statistical power and identify novel modifier genes holds great promise for elucidating AD pathogenic mechanisms.


    . The new genetic landscape of Alzheimer's disease: from amyloid cascade to genetically driven synaptic failure hypothesis?. Acta Neuropathol. 2019 Aug;138(2):221-236. Epub 2019 Apr 13 PubMed.

  4. This elegant genetic study may have identified several important genes that may protect APOEε4-carriers from, or reduce APOEε2 carriers’ protection against, Alzheimer's disease. Both findings challenge our current understanding of APOE genotype as a risk factor for AD.

    Notably, combining the “evolutionary action” (EA) approach, involving the functional impact of an amino acid change within a protein’s coding sequence, and a statistical albeit complex approach called differential “imputed deviation in EA load” (iDEAL), is exceptionally novel. However, one can argue that genetics studies can go only so far without context. When the “explanation” involves more than 200 genes, how much can it actually explain?

    Associated pathway clusters will likely turn out to be more useful, but not necessarily from a strictly genetic perspective. This study found major genes involved in pathways of inflammation, lipoprotein metabolism, and synaptic function, all well-known to be sensitive to environmental influence. Tracking perturbation of the pathways in prodromal AD may be more useful than a focus on 200 simultaneous gene products.

    Likewise, earlier studies found that, for specific populations, the APOEε4 allele does not associate with increased AD risk, and the difference was attributed to environmental, not genetic factors (Gureje et al., 2006Hall et al., 2006). The ideal study for sporadic AD cases may require an understanding of the role of other "Es", i.e. the environment and epigenetics (Maloney and Lahiri, 2016). 


    . APOE epsilon4 is not associated with Alzheimer's disease in elderly Nigerians. Ann Neurol. 2006 Jan;59(1):182-5. PubMed.

    . Cholesterol, APOE genotype, and Alzheimer disease: an epidemiologic study of Nigerian Yoruba. Neurology. 2006 Jan 24;66(2):223-7. PubMed.

    . Epigenetics of dementia: understanding the disease as a transformation rather than a state. Lancet Neurol. 2016 Jun;15(7):760-74. Epub 2016 May 9 PubMed.

  5. We thank our colleagues for noting the statistical power of focusing on paradoxical APOE2/4 genotype-phenotypes to find potential AD genes, the deeper resolution provided by a variant impact weighting method, and the added confidence provided by in vivo validation in Drosophila AD models.

    Sequence quality is paramount, certainly. Communication with the Alzheimer’s Disease Sequencing Project (ADSP) indicated the data had undergone stringent quality control and was of high quality. We confirmed this based on the average TiTv ratio (3.52 ± 0.05) and on the lambda value (0.039 ± 0.001, see Koire, Katsonis, and Lichtarge, 2016) of the variants. As detailed in Methods, only genotype calls from the Atlas calling pipeline were analyzed, and those from the GATK pipeline were excluded. Also, only half of the ADSP Discovery data were available on dbGaP for this project, or 5,686 samples. The updated, complete ADSP Discovery data that was released simultaneously with the completion of our manuscript, in February 2020, as well as the ADSP Extension cohort will now help us refine our findings.

    With respect to methodology, our approach (iDEAL) is not an association analysis so it does not use any covariate. As iDEAL calculates the differential functional mutational load of a gene between two paradoxical groups, “# of all variants” refers to the number of protein-coding variants observed in each gene. The significance of each gene’s signal is assessed using a z-score measured against a background distribution of iDEAL scores built by randomizing the labels. The lower right plot of Figure 1A shows the z-scores versus iDEAL scores for each gene. We control for age by ensuring that healthy controls were older than AD patients (Figure S1). We could not yet control for sex, as splitting samples by male/female would have overly weakened power. But, as discussed, the new ADSP Discovery and Extension cohorts should now support the analyses of males and females, separately.

    While correcting z-scores for multiple hypothesis testing may reduce false positives, we purposefully chose to be permissive in selecting our candidate genes because our goal was to perform in vivo validation in Drosophila models, as well as assess each gene’s relevance to AD in numerous other ways, including risk predictability in non-paradoxical patient groups, differential gene expression in AD versus control brains, and connection with GWAS genes. The AMPAD whole-tissue dataset is the most extensive and current gold standard for AD transcriptomics. We agree that it is not perfect (no omics set is), but as of now the single-cell datasets available lack numbers and robustness to replace it and have their own technical biases as well. For the GWAS network interactions, as indicated in the publication, we did not make the gene calls but relied on the most likely candidates from careful studies (Harold et al., 2009; Hollingworth et al., 2011; Kunkle et al., 2019; Lambert et al., 2009; Lambert et al., 2013; Naj et al., 2011; Seshadri et al., 2010). 

    Here, our paper was laser-focused on APOE paradoxical genotypes, which is a distinct study design from the traditional case-control study over the complete ADSP collection. A future study that follows the latter design would provide a more relevant and fair comparison to Bis et al., 2018

    The space for AD candidate genes/loci is far from saturated. We believe that identifying more genes will not only reveal potential therapeutic avenues, but also provide more robust patient-stratification capabilities as well as stronger diagnostic tools (e.g. biomarkers for disease progression). Overlaying functional information on sequence analysis will help our predictive accuracy and increase statistical power.

    —Ismael Al‐Ramahi, and Juan Botas, all of the Baylor College of Medicine, are co-authors of this comment.


    . Whole exome sequencing study identifies novel rare and common Alzheimer's-Associated variants involved in immune response and transcriptional regulation. Mol Psychiatry. 2018 Aug 14; PubMed.

    . Genome-wide association study identifies variants at CLU and PICALM associated with Alzheimer's disease. Nat Genet. 2009 Oct;41(10):1088-93. PubMed.

    . Common variants at ABCA7, MS4A6A/MS4A4E, EPHA1, CD33 and CD2AP are associated with Alzheimer's disease. Nat Genet. 2011 May;43(5):429-35. PubMed.


    . Genetic meta-analysis of diagnosed Alzheimer's disease identifies new risk loci and implicates Aβ, tau, immunity and lipid processing. Nat Genet. 2019 Mar;51(3):414-430. Epub 2019 Feb 28 PubMed.

    . Genome-wide association study identifies variants at CLU and CR1 associated with Alzheimer's disease. Nat Genet. 2009 Oct;41(10):1094-9. PubMed.

    . Meta-analysis of 74,046 individuals identifies 11 new susceptibility loci for Alzheimer's disease. Nat Genet. 2013 Dec;45(12):1452-8. Epub 2013 Oct 27 PubMed.

    . Common variants at MS4A4/MS4A6E, CD2AP, CD33 and EPHA1 are associated with late-onset Alzheimer's disease. Nat Genet. 2011 May;43(5):436-41. Epub 2011 Apr 3 PubMed.

    . Genome-wide analysis of genetic loci associated with Alzheimer disease. JAMA. 2010 May 12;303(18):1832-40. PubMed.

Make a Comment

To make a comment you must login or register.


News Citations

  1. Largest AD Whole-Exome Study to Date Finds Two New Risk Genes
  2. Sliced by ApoE Genotype, Whole Exome Data Yield New AD Variants

Mutations Citations

  1. TREM2 R47H
  2. TREM2 T96K

Paper Citations

  1. . Aiding the discovery of new treatments for dementia by uncovering unknown benefits of existing medications. Alzheimers Dement (N Y). 2019;5:862-870. Epub 2019 Dec 9 PubMed.
  2. . Brain Cell Type Specific Gene Expression and Co-expression Network Architectures. Sci Rep. 2018 Jun 11;8(1):8868. PubMed.

Further Reading

No Available Further Reading

Primary Papers

  1. . Harnessing the paradoxical phenotypes of APOE ɛ2 and APOE ɛ4 to identify genetic modifiers in Alzheimer's disease. Alzheimers Dement. 2021 May;17(5):831-846. Epub 2020 Dec 7 PubMed.