RNA Sequencing Helps Identify Functional Variants from GWAS
For Alzheimer’s and other complex disorders, mining the genome for disease-associated variants is no longer the obstacle. The challenge nowadays is figuring out how the identified loci relate to disease. As reported this month in Nature and its associated journals, advances in high-throughput RNA sequencing are providing new tools for understanding how disease loci influence gene expression—a starting point for understanding their connection to pathogenesis.
In a September 16 Nature paper, researchers led by Emmanouil Dermitzakis and Tuuli Lappalainen at the University of Geneva, Switzerland, report the largest ever RNA-sequencing study of multiple human populations with sequenced genomes. Their analysis suggests it may now be possible to predict from scores of disease-associated loci which ones actually drive up disease risk rather than merely correlate with it. In a companion paper published September 15 in Nature Biotechnology, the Geneva scientists, with collaborators elsewhere, outline quality control measures that enable RNA sequencing to be done reliably across multiple labs. This was critical for their recent effort and future large-scale studies. And in the September 8 Nature Genetics, a meta-analysis by Lude Franke of the University of Groningen, The Netherlands, and colleagues shows it is feasible, with a large enough dataset, to identify a special class of polymorphisms called trans expression quantitative trait loci (eQTL). These variants associate with gene expression levels but have been hard to map because they act from unpredictably long distances, often from different chromosomes. Together, the new studies highlight the power of RNA sequencing to interpret data from genome-wide association studies (GWAS).
The use of gene expression profiles to analyze genome-wide datasets is not new. However, prior efforts had limited coverage because they used microarrays that contain a limited number of known single nucleotide polymorphisms (SNPs) (e.g. Stranger et al., 2007; Montgomery et al., 2011). Others that did analyze sequencing data had a small sample size of 60 to70 people at most (e.g. Pickrell et al., 2010; Montgomery et al., 2010). For the Nature study, first author Lappalainen—who is now at Stanford University School of Medicine in Palo Alto, California—and colleagues in the Genetic European Variation in Health and Disease (GEUVADIS) Consortium used similar methods but beefed up both quantity and quality. The team sequenced mRNA and microRNA (miRNA) of lymphoblast cell lines from 462 people with fully sequenced genomes. The samples came from Finnish, British, Italian, Nigerian and U.S. cohorts in the 1000 Genomes dataset (see ARF news story), 89-95 per group. Seven labs performed the RNA sequencing, and analyses reported in the Nature Biotechnology paper showed the methods were reproducible across sites.
The study found that the human genome is chock full of eQTLs. “Over half the genes we measured have an eQTL,” Lappalainen told Alzforum. Moreover, 16 percent of known GWAS variants are themselves eQTLs. These statistics indicate there is a good chance some variants will correlate with gene expression without a functional connection to the disease. “You may find many associating polymorphisms, but probably only one is the causal variant. One of our goals here was to try and tease them apart,” Lappalainen said. She said the current study shows how eQTL data can help map causal variants—something that was not possible in prior studies that used microarrays covering only a subset of common variants.
On the whole, the methods and general approach can be useful for neurodegenerative disease research, Carlos Cruchaga of Washington University School of Medicine, St. Louis, Missouri, commented in an email to Alzforum. However, he said it is unclear how well the results themselves will translate because the analyses focused on gene expression in transformed blood cells, not brain tissue. Mark Cookson of the National Institute on Aging, Bethesda, Maryland, expressed a similar concern. “While some of the genes you would be interested in from an AD perspective are also expressed in blood,” he said, “there are more difficult cases. I don’t think there is much tau expression in blood. It may be hard to reliably assess what is going on at the tau locus from a lymphoblastoid cell line.”
Then how about doing this work in brain? The brain’s regional heterogeneity, as well as postmortem artifacts such as acidosis, make it “noisy” for genotyping and sequencing analyses, Cookson said. Nevertheless, he and other scientists have teamed up to analyze gene expression in human brain tissue collected by the North American Brain Expression Consortium (NABEC) and the UK Brain Expression Consortium (UKBEC) (see ARF news story on Colantuoni et al., 2011). The scientists have used the datasets to look at genetic and age effects on epigenetics and gene expression. Several studies have directly compared blood and brain and found some similar and some distinct effects (see Hernandez et al., 2012; Kumar et al., 2013). So far, the brain analyses have been array-based but they are “moving into sequencing-based platforms,” Cookson said.—Esther Landhuis
- Genetics Project Update: Over 1,000 Genomes and Counting
- The Life and Times of the Human Brain Transcriptome
- Colantuoni C, Lipska BK, Ye T, Hyde TM, Tao R, Leek JT, Colantuoni EA, Elkahloun AG, Herman MM, Weinberger DR, Kleinman JE. Temporal dynamics and genetic control of transcription in the human prefrontal cortex. Nature. 2011 Oct 27;478(7370):519-23. PubMed.
- Kumar A, Gibbs JR, Beilina A, Dillman A, Kumaran R, Trabzuni D, Ryten M, Walker R, Smith C, Traynor BJ, Hardy J, Singleton AB, Cookson MR. Age-associated changes in gene expression in human brain and isolated neurons. Neurobiol Aging. 2013 Apr;34(4):1199-209. PubMed.
- Hernandez DG, Nalls MA, Moore M, Chong S, Dillman A, Trabzuni D, Gibbs JR, Ryten M, Arepalli S, Weale ME, Zonderman AB, Troncoso J, O'Brien R, Walker R, Smith C, Bandinelli S, Traynor BJ, Hardy J, Singleton AB, Cookson MR. Integration of GWAS SNPs and tissue specific expression profiling reveal discrete eQTLs for human traits in blood and brain. Neurobiol Dis. 2012 Jul;47(1):20-8. PubMed.
- Montgomery SB, Lappalainen T, Gutierrez-Arcelus M, Dermitzakis ET. Rare and common regulatory variation in population-scale sequenced human genomes. PLoS Genet. 2011 Jul;7(7):e1002144. PubMed.
- Lappalainen T, Sammeth M, Friedländer MR, 't Hoen PA, Monlong J, Rivas MA, Gonzàlez-Porta M, Kurbatova N, Griebel T, Ferreira PG, Barann M, Wieland T, Greger L, van Iterson M, Almlöf J, Ribeca P, Pulyakhina I, Esser D, Giger T, Tikhonov A, Sultan M, Bertier G, MacArthur DG, Lek M, Lizano E, Buermans HP, Padioleau I, Schwarzmayr T, Karlberg O, Ongen H, Kilpinen H, Beltran S, Gut M, Kahlem K, Amstislavskiy V, Stegle O, Pirinen M, Montgomery SB, Donnelly P, McCarthy MI, Flicek P, Strom TM, , Lehrach H, Schreiber S, Sudbrak R, Carracedo A, Antonarakis SE, Häsler R, Syvänen AC, van Ommen GJ, Brazma A, Meitinger T, Rosenstiel P, Guigó R, Gut IG, Estivill X, Dermitzakis ET, Palotie A, Deleuze JF, Gyllensten U, Brunner H, Veltman J, Cambon-Thomsen A, Mangion J, Bentley D, Hamosh A. Transcriptome and genome sequencing uncovers functional variation in humans. Nature. 2013 Sep 26;501(7468):506-11. PubMed.
- 't Hoen PA, Friedländer MR, Almlöf J, Sammeth M, Pulyakhina I, Anvar SY, Laros JF, Buermans HP, Karlberg O, Brännvall M, , van Ommen GJ, Estivill X, Guigó R, Syvänen AC, Gut IG, Dermitzakis ET, Antonorakis SE, Brazma A, Flicek P, Schreiber S, Rosenstiel P, Meitinger T, Strom TM, Lehrach H, Sudbrak R, Carracedo A, van Iterson M, Monlong J, Lizano E, Bertier G, Ferreira PG, Ribeca P, Griebel T, Beltran S, Gut M, Kahlem K, Lappalainen T, Giger T, Ongen H, Padioleau I, Kilpinen H, Gonzàlez-Porta M, Kurbatova N, Tikhonov A, Greger L, Barann M, Esser D, Häsler R, Wieland T, Schwarzmayr T, Sultan M, Amstislavskiy V, den Dunnen JT. Reproducibility of high-throughput mRNA and small RNA sequencing across laboratories. Nat Biotechnol. 2013 Sep 15; PubMed.
- Westra HJ, Peters MJ, Esko T, Yaghootkar H, Schurmann C, Kettunen J, Christiansen MW, Fairfax BP, Schramm K, Powell JE, Zhernakova A, Zhernakova DV, Veldink JH, van den Berg LH, Karjalainen J, Withoff S, Uitterlinden AG, Hofman A, Rivadeneira F, 't Hoen PA, Reinmaa E, Fischer K, Nelis M, Milani L, Melzer D, Ferrucci L, Singleton AB, Hernandez DG, Nalls MA, Homuth G, Nauck M, Radke D, Völker U, Perola M, Salomaa V, Brody J, Suchy-Dicey A, Gharib SA, Enquobahrie DA, Lumley T, Montgomery GW, Makino S, Prokisch H, Herder C, Roden M, Grallert H, Meitinger T, Strauch K, Li Y, Jansen RC, Visscher PM, Knight JC, Psaty BM, Ripatti S, Teumer A, Frayling TM, Metspalu A, van Meurs JB, Franke L. Systematic identification of trans eQTLs as putative drivers of known disease associations. Nat Genet. 2013 Oct;45(10):1238-43. PubMed.
To make an annotation you must Login or Register.
Washington University School of Medicine
These papers have numerous implications. Together with the paper published by ENCODE (ENCODE Project Consortium, 2012), this work clearly highlights the importance of the non-coding DNA on gene regulation and expression. One repercussion of these findings is their impact on interpreting GWAS results. Most GWAS, including those for risk for Alzheimer’s disease, have identified several loci located in intergenic regions. In these cases it is difficult to understand what is the mechanism by which those loci are associated with disease and to identify the functional variant responsible for that association. By integrating GWAS data with these results, the reason why some loci are associated with a specific disease or trait can be explained. This also may help to identify potential therapeutic interventions.
In terms of AD, it is worth checking the known AD GWAS signals against this data to help us interpret AD GWAS data. That said, overall, these methods can be translated to neurodegenerative disease research more as a general approach than being directly relevant. The main limitation with direct extrapolation is that these studies focused on gene expression in blood, not brain tissue. Blood and brain will have some of these eQTL in common, but not all. As AD is clearly a brain disease, it will be good to analyze how much overlap there is between brain and blood eQTLs. To date this is unknown.
I would like to see the results of this type of analysis for brain gene expression. That could help us identify the functional variant driving the association in a given AD GWAS analyses. For most of the AD GWAS loci located on PICALM, CR1, CD33, for example, we do not know which is the functional variant of the disease mechanism. As an example, it looks as if the SNP associated with AD in CD33 modifies CD33 expression (Bradshaw et al., Nat Neurosi 2013 and Griciuc A et al., Neuron 2013), and that this different CD33 expression leads to different Abeta clearance and accumulation. In this case, these two groups focused on CD33 and found an association with expression. But with genome-wide expression data, it would be possible to analyze all AD GWAS hits at the same time, and potentially identify additional functional variants. After that, additional functional analysis will be needed to fully understand the “pathogenic” mechanism, but the identification of the functional variant will be a great step.
Make a Comment
To make a comment you must login or register.