Once upon a time, sequencing a single genome was something to brag about. Those days are gone. In the October 1 Nature, a consortium of researchers publishes its final report on the 1000 Genomes Project. The consortium sequenced the full genomes of 2,504 people, from 26 different populations across the Americas, Eurasia, and Africa. This took seven years. In total, the project identified 88 million variants in the human genome. The typical person’s DNA was dotted with 4 to 5 million of them. Single-nucleotide polymorphisms (SNPs) contributed the bulk of these variants, but your average Joe also carries a couple of thousand structural variants, such as deletions and insertions. In a separate report in the same issue of Nature, a subset of authors catalog those structural variations, reporting complex rearrangements geneticists had not seen before.

Casting a Wide Net. Project scientists collected DNA from people in 26 different populations across five continents. [Courtesy of The 1000 Genomes Project Consortium, Nature.]

The main paper is by corresponding authors Adam Auton of the Albert Einstein College of Medicine in New York, Gonçalo Abecasis of the University of Michigan in Ann Arbor, and hundreds of co-authors. They report variants found in diverse ethnicities, from Finns to Puerto Ricans to the Yoruba of Nigeria. They estimate that their data set accounts for more than 99 percent of SNPs and 85 percent of larger variants that are present at a frequency of at least 1percent in the populations studied. Very rare variants, exclusive to certain small populations or families, might not appear in this data set.

As previous studies have indicated, people in Africa exhibited the most variable genomes. Because Homo sapiens arose there, the continent houses the oldest human populations, the ones who have had the most time for genetic drift to create variety. Scientists believe other populations underwent bottlenecks in genetic diversity as small founder groups emigrated from Africa carrying only a smidgen of the total population diversity with them. Since their departure from Africa, these younger ethnic groups have not had time to build up the same range of diversity that African populations possess.

Compared with SNPs, structural variants have been harder for geneticists to identify because genome sequencing relies on piecing together overlapping bits of short sequences. Repeat sequences and deletions are therefore easy to miss. To address this, researchers led by co-corresponding authors Evan Eichler of the University of Washington in Seattle and Jan Korbel of the European Molecular Biology Laboratory in Heidelberg, Germany, combined the 1000 Genomes sequences with data from other types of analyses, such as long-read sequencing, to characterize these kinds of variants. Besides simple deletions, insertions, and inversions, they also discovered more complex patterns of genomic shuffling, some new to science. For example, they saw places where multiple deletions occurred in a row, or spots where genes were duplicated, then inverted. They discovered about 240 genes that were missing in the genomes of many study participants, hence the authors deemed those genes potentially “dispensable.”

While the 1000 Genomes project only just wrapped up, scientists are already making regular use of its catalog of human variation. The authors published an interim data set, composed of 1,092 genomes from 14 countries, in 2012 (see Nov 2012 news; Nov 2010 news).

In the neurodegenerative disease field, scientists tap 1000 Genomes most often in GWAS interpretation, in a process called imputation. GWAS typically use SNP microarray chips, which identify some, but not nearly all, of the places where the human genome can vary by a single nucleotide. Then, scientists turn to data sets like 1000 Genomes to predict, or impute, what other variants are likely co-inherited with the SNPs flagged in the GWAS. Having full sequences of more individuals in the final data set will make imputation more accurate, said Mark Cookson of the National Institute on Aging in Bethesda, Maryland, who did not participate in the project. That will be particularly helpful in finding expression quantitative trait loci (eQTLs), he added. EQTLs are non-coding genetic sequences that regulate gene expression and are believed to contain many of the functional variants linked to diseases.

Ewan Birney of the European Bioinformatics Institute in Cambridge, England, and Nicole Soranzo of the University of Cambridge also highlighted the importance of 1000 Genomes in GWAS imputation in a commentary accompanying the Nature paper. “Because genotyping arrays are cheap, the ability to infer variation allows researchers to focus on increasing sample sizes—a crucial next step in improving our understanding of the genetics of diseases,” they wrote. They called the 1000 Genomes Project a foundation for the future of human population genetics.

In a sense, 1000 Genomes provides a control population so scientists studying a particular disease or population do not have to build their own. In that regard, the variety of people sampled by the 1000 Genomes group offers an advantage, Cookson said. For example, when researchers want to focus on a particular ethnic group—Europeans, say—they can compare their genotypes to the 1000 Genome data to identify, and discard, any samples from people who have a different ancestry.

The new structural-variants catalog can help scientists studying diseases inherited in families, just as SNP databases already do. Doctors who suspect a mutation, inversion, or deletion is to blame for an age-related neurodegenerative disease, for example, can check the 1000 Genomes data to find out if that variant commonly occurs. If it does, it might be innocuous, explain the authors. Without that context, a structural variation found in an afflicted family might be mistaken for the cause of the disease. The papers do not reveal the age or health of the people who donated the DNA samples.

Scientists can freely access the genotype data at the 1000 Genomes website. Researchers who spot something interesting in those A’s, T’s, G’s and C’s can purchase from the Coriell Institute for Medical Research in Camden, New Jersey, the original DNA samples for more detailed study. Also available are immortalized cell lines generated from the blood cells of people who donated their genomes.—Amber Dance

Comments

No Available Comments

Make a Comment

To make a comment you must login or register.

References

News Citations

  1. Genetics Project Update: Over 1,000 Genomes and Counting
  2. Next-Generation Sequencing: Boldly Going Where No Geneticist...

External Citations

  1. 1000 Genomes Project
  2. Coriell Institute for Medical Research

Further Reading

Papers

  1. . Genome-wide pathway analysis of a genome-wide association study on Alzheimer's disease. Neurol Sci. 2015 Jan;36(1):53-9. Epub 2014 Jul 19 PubMed.
  2. . SNP imputation bias reduces effect size determination. Front Genet. 2015;6:30. Epub 2015 Feb 9 PubMed.
  3. . Concordance between direct and imputed APOE genotypes using 1000 Genomes data. J Alzheimers Dis. 2014;42(2):391-3. PubMed.

Primary Papers

  1. . A global reference for human genetic variation. Nature. 2015 Sep 30;526(7571):68-74. PubMed.
  2. . An integrated map of structural variation in 2,504 human genomes. Nature. 2015 Sep 30;526(7571):75-81. PubMed.
  3. . Human genomics: The end of the start for population sequencing. Nature. 2015 Sep 30;526(7571):52-3. PubMed.