Early navigators who ventured into the vast unknown were sometimes rewarded with landfall on warm, tropical islands. Modern-day explorers charting diversity in the genetic code have found a few hot spots of their own. In the October 27 Nature, an international crew of investigators called the International HapMap Consortium published their first draft of the human haplotype chart. This navigational aid promises to help researchers plumb the depths of the human genome for hazards, that is, variations that confer susceptibility to a myriad of diseases. These include, but are not limited to, neurodegenerative disorders, such as Alzheimer disease, and psychiatric illnesses like schizophrenia.

The HapMap project, led by David Altshuler at the Broad Institute of Harvard and MIT and Peter Donnelly at the University of Oxford in England, began in October 2002 with the goal of drafting a haplotype map within 3 years. Almost 3 years to the day later, the consortium, with research input from Canada, China, Japan, Nigeria, the UK, and the US, released a phase I map based on 269 sequenced genomes. The DNA samples were obtained from volunteers in Tokyo, Japan; Ibadan, Nigeria; Beijing, China; and Utah, USA.

Haplotypes are a means of cataloging genetic variance. Though 99.9 percent of the 3 billion or so nucleotides that make up the human genetic code are identical among the world’s 6.5 billion people, it is the 0.1 percent difference that ensures we don’t all look, sound, or think alike. And while that variety may add spice to life, it is also a large part of the reason for why some of us succumb to cancer, diabetes, or a disease of the central nervous system. Though some diseases can be blamed on a single letter change in the genetic code—the single nucleotide polymorphism (SNP)—more complex diseases are thought to result from a number of such changes. This represents an enormous challenge when trying to identify what specific combinations of mutations confer susceptibility to disease, or resistance to a drug. If you thought searching for a needle in a haystack was hard, consider trying to find five or ten needles among the 23 haystacks that are the human chromosomes.

This is where the haplotype comes in. A haplotype is a section of DNA that contains many single nucleotide polymorphisms. Because SNPs come and go infrequently, haplotypes are relatively stable. They also tend to be inherited as a whole block because genetic recombination, which could potentially rearrange the haplotype, is also somewhat rare. So if you inherit one haplotype SNP, you most likely inherit all the other associated SNPs, too.

For this reason, haplotype analysis has the potential to reduce significantly the amount of searching researchers need to do to find SNPs that are associated with a disease or other phenotype, such as drug resistance or sensitivity. In short, by finding one needle in the haystack, you can pull out many others along with it. In their paper, The HapMap Consortium reports how this redundancy may be even more extensive than previously thought. Using a pair-wise comparison method to analyze all the SNPs genotyped, they found that identifying a “tag” SNP every 5-10 kilobases of DNA is sufficient to reveal all the common variants in genome samples obtained from the Utah, Chinese, and Japanese volunteers. A slightly more dense array of SNPs (one every two to five kilobases) would achieve the same result for the Nigerian samples.

In practice, this means that in order to identify, with reasonable accuracy, which of the 10 million or so SNPs a person carries, researchers would only have to test about 250,000 tag SNPs. To be 100 percent accurate the number jumps up to about 450,000 tag SNPs (600,000 in the case of the Nigerian population), still less than 10 percent of the total. Hence, identifying haplotypes should not only be faster, but also cheaper than expected. In fact, consortium member Yusuke Nakamura, University of Tokyo, estimates that haplotype mapping could reduce the cost of searching for inherited genetic factors by 10- to 20-fold.

The point is illustrated by David Goldstein and Gianpiero Cavalleri from Duke University in an accompanying News & Views. Four years ago, these authors launched a project to identify which polymorphisms in the gene SCN1A are responsible for dictating a given patient’s response to an epileptic drug. It took the research team 2 years to identify the common SNPs and appropriate tags. “Today, the same job can be accomplished with simple computer algorithms, in minutes, using the HapMap data,” they write.

The value of the HapMap project is illustrated in an accompanying Nature paper from Vivian Cheung and colleagues at the University of Pennsylvania and The Children’s Hospital, both in Philadelphia. These researchers used the HapMap data to identify SNPs that influence gene expression. For 15 of 27 different genes previously identified as being heavily influenced by genetic variation, the authors found that their HapMap-based study agreed with previous findings—the HapMap analysis pointed to exactly the same cis-regulatory regions in the DNA. For one gene, chitinase 3-like 2 (CHI3L2), Cheung and colleagues were able to identify the exact SNP that regulated expression—a G to T mutation that leads to stronger binding of RNA polymerase II, which makes messenger RNA. “Our findings suggest that association studies with dense SNP maps will identify susceptibility loci or other determinants for some complex traits or diseases,” write the authors.

Though haplotype analysis may increase efficiency, there are concerns that it may do so at a cost—the studies may be weak in terms of statistical power. The second phase of the HapMap project, which is designed to uncover considerably more SNPs, may help in this regard, and in the meantime, methods exist that can be employed to increase the power of the studies. So conclude Altshuler and colleagues in a related Nature Genetics paper published online October 23. Joint first authors Paul de Bakker, Roman Yelensky, and colleagues report that there are numerous ways of carrying out the analysis to preserve statistical power. They found, for example, that analyzing all haplotypes for association, not just those that have been linked to known SNPs, can increase the chances of detecting rare polymorphisms that cause disease.—Tom Fagan


  1. With the completion of the HapMap and its commercialization by Illumina and Affymetrix, it should be possible for researchers to find susceptibility alleles which have an odds ratio of >2 for any disorder, including Alzheimer disease, over the next couple of years. The expense will be high: Sample sizes of about 500 cases and 500 controls will be needed, and the cost per sample is on the order of $900. But if there are anymore genes with the effect size of ApoE out there, for AD or other diseases, we should now be able to find them.

  2. Q&A with Lars Bertram.

    Q: Does the map provide enough resolution?

    A: On average, the haplotype map has investigated about 1 SNP every 5,000 bases (i.e., 5 kb). For most applications this density should be sufficient to allow linkage disequilibrium mapping of common variants with at least moderate effects in genetically complex diseases. However, a phase 2 of the HapMap is planned which will probably more than quadruple this resolution.

    Q: Is there anything particular about Alzheimer disease that makes
    haplotyping any more or less useful than for other diseases?

    A: The good news for AD is that its heritability, even of the late-onset form, is relatively well established. Even after excluding the effects of ApoE, this means that there are probably several additional genetic risk factors waiting to be identified. The bad news is that AD is a late-onset disease which usually means that parents are deceased when their offspring develop the disease and no parental samples are available to precisely reconstruct
    ("phase") haplotypes. Nonetheless, the HapMap data was assembled based on child-parent trios, so much of the haplotype phasing has already been done.

    Q: Will the HapMap help in complex diseases, where several variants on
    different chromosomes must interact, for example?

    A: While the HapMap has many valuable uses in designing and interpreting future genetic association in AD and other diseases, it will unfortunately not help to better understand interactions between different genetic loci or non-genetic factors, because such interactions likely vary from phenotype to phenotype.

    Q: Will the HapMap help in diseases where gene silencing, mRNA
    splicing, and other post-transcriptional and post-translational
    modifications are key to the pathophysiology?

    A: If these pathophysiological changes are actually caused by common genetic variants in the genome, HapMap will definitely help us find them. It will still require a good number of experiments, though, to actually prove the causal relationship between associated SNPs on the one hand, and differences in mRNA splicing (for instance) on the other hand.

    Q: Is the principle of tagging haplotypes scientifically sound, or
    does it run the risk of missing out on haplotypes that are low in
    frequency but high
    in consequence?

    A: The principle of tagging haplotypes to cover untyped common genetic variants is certainly sound, and—with the data provided by the current HapMap release—has just become a whole lot easier. As everything in science, it does have limitations (such as finding very low-frequency polymorphisms or haplotypes). However, this is a rapidly evolving field and the planned phase 2 release of the HapMap, together with novel analytic strategies, should facilitate even the search for such uncommon variants in the near future.

Make a Comment

To make a comment you must login or register.


No Available References

Further Reading

No Available Further Reading

Primary Papers

  1. A haplotype map of the human genome. Nature. 2005 Oct 27;437(7063):1299-320. PubMed.
  2. . Genomics: understanding human diversity. Nature. 2005 Oct 27;437(7063):1241-2. PubMed.
  3. . Mapping determinants of human gene expression by regional and genome-wide association. Nature. 2005 Oct 27;437(7063):1365-9. PubMed.
  4. . Efficiency and power in genetic association studies. Nat Genet. 2005 Nov;37(11):1217-23. PubMed.