23 November 2006. When it comes to pathological variations in the human genome, single nucleotide polymorphisms have taken their fair share of the heat. After all, simple point mutations are responsible for a plethora of disorders, including Alzheimer disease—think amyloid precursor protein (APP) and presenilin (PS) mutations—and Parkinson disease (PD)—consider mutations in LRRK2 and α-synuclein, to name just two. But what about more profound changes or rearrangements of the human genome? If an extra copy of the APP gene can cause AD (see ARF related news story) or α-synuclein, PD (see ARF related news story), could copy number variation have as much, or even greater, impact on human health as SNPs?
Yes, suggests Matthew Hurles and colleagues in today’s Nature. Hurles, from the Wellcome Trust Sanger Institute in Cambridge, England, and an international collaborative of academic and industry researchers have compiled the first copy number variation (CNV) map of the human genome. They found a total of 1,447 copy number variable regions (CNVRs) covering 360 megabases of DNA (12 percent of the total genome), which means that CNVRs cover more nucleotide content per genome than SNPs. “The data suggest that the greatest source of genetic diversity in our species lies not in millions of SNPs, but rather in larger segments of the genome whose presence or absence calls into question what exactly is a ‘normal’ human genome,” write Kevin Shianna and Huntington Willard, Duke University, Durham, North Carolina, in an accompanying Nature News & Views.
This color representation of CNVs in the International HapMap Project samples shows segments of DNA that are overrepresented (green) or underrepresented (red) in many individuals. Image credit: Matthew Hurles
First author Richard Redon and colleagues applied high-throughput detection and high-density DNA oligonucleotide arrays (described in two separate papers in today’s Genome Research—Komura et al., 2006, and Fiegler et al., 2006) to generate the CNVR map using the same DNA samples and cell lines used by the International HapMap Project. These samples cover four different ethnic populations: Nigerians, Europeans, Japanese, and Han Chinese (see ARF related news story). The researchers defined CNVs as segments of DNA 1 kb or larger in size that are present in variable copy number in comparison to a reference genome. Almost one-quarter of all CNVs were associated with segmental duplications, which is perhaps not surprising, given that natural selection tends to weed out deletions.
Though the vast majority of CNVs were found outside of coding sequences, thousands of putative functional segments of DNA fall within CNVs and 99 percent of them overlap with conserved non-coding sequences. The researchers also determined that at least 10 percent of disease-related genes in the OMIM database are associated with CNVs. Genes include CCL3L1, which has been linked to increased resistance to HIV-1, and DISC1, which undergoes a chromosomal translocation that might be causative for schizophrenia (see Schizophrenia Research Forum related news story).
“Given the limited set of reference samples assayed, the 1,500 CNVs reported by Redon et al. are probably the tip of the iceberg. As the results and the raw data from the first wave of genome-wide association studies become available, it will be essential to catalogue the full range of human CNVs,” write Shianna and Willard. To that end, the Wellcome Trust Sanger Institute has developed a CNV database called DECIPHER to share information on CNVs and rare, severe phenotypes.
In a related paper coauthored by many of the same researchers, Lars Feuk, University of Toronto, Canada, and colleagues describe a comparison of the two human genome assemblies: those produced by Celera Genomics and the Human Genome Sequencing Consortium. First author Razi Khaja and colleagues use the data to demonstrate that there are megabases of sequence information, specifically over 13,500 non-SNP events, that are absent, inverted, or polymorphic in one assembly compared to the other. The data indicate that there is substantial undescribed variation within the human genome and suggest that more comprehensive annotation will be needed as we enter the era of personalized, genomic-based medicine.—Tom Fagan.
Redon R, Ishikawa S, Fitch KR, Feuk L, Perry GH, Andrews TD, Fiegler H, Shapero MH, Carson AR, Chen W, Cho EK, Dallaire S, Freeman JL, Gonzalez JR, Gratacos M, Huang J, Kalaitzopoulos D, Komura D, MacDonald JR, Marshall CR, Mei R, Montgomery L, Nishimura K, Okamura K, Shen F, Somerville MJ, Tchinda J, Valesia A, Woodwark C, Yang F, Zhang J, Zerjal T, Zhang J, Armengol L, Conrad DF, Estivill X, Tyler-Smith C, Carter NP, Aburatani H, Lee C, Jones KW, Scherer SW, Hurles ME. Global variation in copy number in the human genome. Nature. November 23, 2006;444:444-454. Abstract
Fiegler H, Redon R, Andrews D, Scott C, Andrews R, Carder C, Clark R, Dovey O, Ellis P, Feuk L, French L, Hunt P, Kalaitzopoulos D, Larkin J, Montgomery L, Perry GH, Plumb BW, Porter K, Rigby RE, Rigler D, Valsesia A, Langford C, Humphray SJ, Scherer SW, Lee C, Hurles ME, Carter NP. Accurate and reliable high-throughput detection of copy number variation in the human genome. Genome Res. November 23, 2006;16:1566-1574. Abstract
Komura D, Shen F, Ishikawa S, Fitch KR, Chen W, Zhang J, Liu G, Ihara S, Nakamura H, Hurles ME, Lee C, Scherer SW, Jones KW, Shapero MH, Huang J, Aburatani H. Genome-wide detection of human copy number variations using high-density DNA oligonucleotide arrays. Genome Res. November 23, 2006;16:1575-1584. Abstract
Shianna KV, Willard HF. Human genomics: in search of normality. Nature. November 23, 2006;444:428-429. Abstract
Khaja R, Zhang J, MacDonald JR, He Y, Joseph-George AM, Wei J, Rafiq MA, Qian C, Shago M, Pantano L, Aburatani H, Jones K, Redon R, Hurles M, Armengol L, Estivill X, Mural RJ, Lee C, Scherer SW, Feuk L. Genome assembly comparison identifies structural variants in the human genome. Nat Genet. November 22, 2006. Advance online publication. Abstract