Sequencing a person’s entire genome will foretell many diseases only marginally better than gazing into a crystal ball, according to a report in the April 2 Science Translational Medicine online. Researchers at Johns Hopkins University in Baltimore, Maryland, modeled data from identical twins to determine if a person’s genome can predict disorders such as cancer, Alzheimer’s, and heart disease. The results sound neither a ringing endorsement nor a death knell for whole-genome sequencing. The analysis suggests that the average person’s genome would likely predict one future condition with a reasonable degree of accuracy. But it would also yield a host of negative results that are largely meaningless, because all they indicate is that the person’s risk of those conditions is no higher than the risk to the general population.

A few notable exceptions: Sequencing would predict many cases of Alzheimer’s disease, the scientists found, as well as thyroid autoimmunity, type 1 diabetes, and coronary heart disease in men. Ultimately, the value of whole-genome sequencing will be unique to each individual and his or her situation, said Nicholas Roberts, who was co-first author with Joshua Vogelstein. “We hope to start debate about the merits of personal genome sequencing based on this model or other models,” Roberts said. Bert Vogelstein, co-senior author with Victor Velculescu, presented the study on April 2 at the American Association for Cancer Research annual meeting in Chicago, Illinois.

Roberts and colleagues based their approach on sets of identical twins. Assuming that the pairs would have matching genomes, any variation would be due to environmental factors or the random nature of some illnesses. The researchers did not actually sequence anyone for this study, nor did they need sequences for their analysis. Rather, they input into the computer model the number of instances in which both twins had a disease, both were healthy, or were mismatched. The scientists culled these data from several twin studies performed in the U.S. and Europe, including Gatz et al., 1997, for Alzheimer’s and dementia and Tanner et al., 1999, for Parkinson’s.

The computer analyzed the matched and mismatched twin pairs, and tried to come up with different scenarios for how their genomes could result in the observed outcomes. For example, the Alzheimer’s data included two pairs who both had AD, eight mismatched pairs, and 388 pairs who were all healthy. Given that each pair had identical genomes, how could those genomes result in this particular distribution? The program assumed that each genome carried a specific genetic risk for Alzheimer’s, say, 7 percent or 15 percent or 70 percent. By randomly trying different risk levels for each of the genomes, the computer worked out several hypothetical scenarios by which the twins’ genes could lead to their disease states. It settled on the version most likely to predict disease, which the authors used as a model for how closely genomes and conditions might align.

The study rests on two major assumptions, Roberts said, making it a “best-case scenario” rather than an accurate representation of genetic sequencing as it stands today. For one, the model presupposes that every genome sequence is 100 percent accurate; in reality, mistakes are currently inevitable. Second, the model assumes that researchers know every variant linked to a given disease, and how all the alleles work together to determine risk. While this information is not necessary for the modeling that Roberts and colleagues performed, it would be required for sequencing to actually predict disease as they envision. This is an ideal which science is likely to someday approach, but never reach, said Nathan Pearson, director of research for Knome, Inc., of Cambridge, Massachusetts. Knome helps researchers interpret genome data to further understanding of disease.

Peter Visscher of the University of Queensland in Brisbane, Australia, said the model is “unusual and unconventional” for studies trying to answer questions about genetic predictions. For one thing, the model assumes there are up to 20 discrete risk levels for each disease. Many diseases incorporate risk from hundreds or thousands of loci, which means the number of possible genotypes and risk levels is closer to infinity, Visscher said. Other researchers have modeled sequencing and risk with unlimited possible genomes (Janssens et al., 2006), noted Visscher. He prefers his own methods, which also do not restrict genotypes (Wray et al., 2010). In addition, Visscher said, in studies like this it is difficult to untangle the genes shared by twins from all the other things they share, such as upbringing and diet, meaning that the model is likely to overestimate genetic contributions to disease.

For most of the 24 diseases considered, sequencing would miss the majority of cases, Roberts said. And a negative result would not be a “free pass,” he added; it would only mean no genes point to a higher-than-average risk. “[It] is sobering,” wrote Svante Pääbo, of the Max Planck Institute for Evolutionary Anthropology in Leipzig, Germany, in an e-mail to ARF (see full comment below). “I would have naïvely thought that monozygotic twins would be more similar in the diseases that afflict them.”

John Hardy of University College London, U.K., was not so surprised. He has identified a pair of twins who share the late-onset autosomal dominant Park8, or G2109S, mutation in leucine-rich repeat-kinase type 2, but only one of them is sick with the disease. The finding will be published in an upcoming issue of Movement Disorders. The phenotypic differences between monozygotic twins might be due to epigenetics, somatic mutations, or random chance, suggest Susanne Schneider of the University of Lübeck, Germany, and Michael Johnson of Imperial College London, U.K., in a related editorial that will be published with Hardy’s paper in the same journal.

In contrast to the majority of diseases, the model predicted that genetic screening for AD was highly sensitive. Of people destined to develop Alzheimer’s, sequencing would identify 80-90 percent. This might be because AD has a particularly strong genetic basis; alas, it is impossible to tell at this point how much of this prediction is due to specific genes such as ApoE. The team also analyzed Parkinson’s disease, for which genes could predict 20-30 percent of cases, and dementia in general (including both AD and vascular dementia), for which sequencing would identify 50-60 percent of people who would eventually have it. No other neurodegenerative conditions were examined.

“This paper is, in some sense, a grain of salt to accompany the great expectations for whole-genome analysis,” Pearson said. “For many people, [sequencing] is going to provide limited insight into the risk for many common diseases.” On the positive side, he added, the study found that more than 90 percent of people would receive a useful prediction of above-average risk for at least one condition. The currently popular genomewide association studies to identify common but weak risk variants, Roberts said, remain valuable for the clues they provide to biological pathways involved in disease.

One positive interpretation of the study, Pääbo noted, is that “our destiny is not in our genes. Rather, many other things that we can influence, such as our lifestyle and doing medical checkups, may be much more important for reducing our risk of prematurely being affected by diseases.”—Amber Dance


  1. In general, this paper is sobering. I would have naïvely thought that monozygotic twins would be more similar in the diseases that afflict them.

    Perhaps this may be seen as positive, in that our destiny is not in our genes. Rather, many other things that we can influence, such as our lifestyle and getting medical checkups, may be much more important for reducing our risk of prematurely being affected by diseases.

    Even for Alzheimer's disease, one of the cases where genome sequencing has the potential to perform best, according to this study, only two twin pairs out of 10 in the study are "concordant," i.e., in only two out of 10 cases do both monozygous twins have the disease, whereas in the eight cases, only one twin is affected.

    Food for thought is also that research approaches other than genomics, in particular, into physiology and epidemiology, may provide more economical routes to a better understanding into how we prevent many diseases.

  2. The study by Vogelstein and colleagues constitutes a milestone for the field of human genetics, and has critical ramifications both medically and socially. By challenging the long-standing dogma that genetic testing would automatically provide absolute information on the future pathologies of an individual, this study resets the field and gives much more weight to the critical contribution that epigenetic modifications have in interpreting the history, physical status, and lifestyle of the patient.

  3. This study is quite valuable in that it systematically attempts to ascertain the value of genetic predictions. It is expected that negative predictions are not useful. It was interesting that the authors indicated that they could predict a positive outcome for one of 24 major diseases on average. Such information would be useful to a person at risk for that disease.

    However, I expect the real figure will be even higher than that, based on the clinical interpretation of genomes that have been analyzed thus far (the Quake genome, Ashley et al., 2010); the West family, Dewey et al., 2011) and now my genome just published in Cell (Chen et al., 2012). These all show increased risk for several important diseases from the genome sequence. It is true that in many cases a person will only die (or become severely affected) from one of them; when that happens, information from the other diseases is often lost or masked. Also, as noted by the authors, some of the diseases are likely to be related (i.e., not independent) and, thus, the genetic linkage is higher. I think a genome sequence will provide useful information that can alert people to many possible conditions, all of which can be followed.


    . Clinical assessment incorporating a personal genome. Lancet. 2010 May 1;375(9725):1525-35. PubMed.

    . Phased whole-genome genetic risk in a family quartet using a major allele reference sequence. PLoS Genet. 2011 Sep;7(9):e1002280. PubMed.

    . Personal omics profiling reveals dynamic molecular and medical phenotypes. Cell. 2012 Mar 16;148(6):1293-307. PubMed.

  4. Below, I'd like to offer some constructive criticism on the paper itself, along with thoughts on the engaging public discussion that it has prompted.

    1. The work is conventional, and the findings not surprising.

    I agree with Svante Paabo and and Paolo Sassone-Corsi that papers like this are important antidotes to any rash expectation that, for the average fairly healthy adult, precise and accurate genetic risk prediction will be easy. But I doubt that serious geneticists hold such expectations. As a meta-analysis of past twin studies, this paper reheats the old and simple observation that so-called "identical" twins don’t always get the same diseases.

    Following in the long twin study tradition, the authors tackle the age-old question of how genetically heritable each disease is overall. And, beyond that, how does genetic risk vary among people? That is, for a given disease, do genomes contribute a portion of risk that varies widely, but smoothly, from person to person—or a portion of risk that varies more sharply from person to person—or some mix in between? If the paper says much that is new, it's in the last—in trying to sort particular diseases by estimating how well, on average, one may be able to forecast them from the genome alone.

    This raises a key caveat: Many of the diseases the paper looks at are cancers, which are well known to be less genetically heritable than some other diseases. There in particular, the paper recasts longstanding knowledge as if it were a new grain of salt for the coming era of genomically personalized healthcare.

    2. The work fits into a prevailing integrative view of genomically personalized healthcare.

    Looking to the prospect of genomically personalized healthcare, serious thinkers have always understood that genomes will complement, not replace, conventional cornerstones of clinical care. Face-to-face doctor visits, family history, lab tests, and so forth will remain essential, but none on its own will tell us everything we need to know about disease risk. Lab tests often happen too late, and family history is limited in utility for exactly the reasons this paper highlights.

    By rough analogy, note that weather forecasters use satellite photos every day for remarkably detailed insight into what is happening in the atmosphere; nonetheless, they would not try to predict today's high temperature in Leipzig or Palo Alto solely from such photos. Rather, they are using such modern, comprehensive views of the data alongside older ground-based instruments to predict the weather (and they still get it wrong sometimes).

    Likewise, whole-genome sequencing will play a crucial role in everyday healthcare in future, but always together with other sources of medical insight. At Knome, we know that it’s crucial not to hype genomes as silver bullets of healthcare. To do so would do the public a disservice and, in raising false expectations of surefire returns, risk a backlash from funders of healthcare and research.

    Yet it is important to note that discoveries from individual genomes have already helped many families. This has been true ever since the first genetic disease variant, for sickle cell anemia, was discovered in the late 1950s. As we continue to survey more people’s whole genomes while gathering data on their diseases and other traits, the average person's genome will indeed tell us more and more about what makes her unique, and about the distinctive health risks she faces.

    3. The paper makes some optimistic and testable predictions.

    It is worth stressing, as the article and Michael Snyder have, that the paper is cautiously upbeat, claiming to help us understand which diseases might be particularly amenable to genetic risk prediction. The basic observation on monozygotic twins should temper any undue expectation that WGS offers slam-dunk insight into the typical adult's long-term risk for all common diseases. At the same time, the authors do posit that whole-genome interpretation may nonetheless offer nearly all of us some significant hint of distinctive risk for at least one major disease.

    Admirably, the authors make more specific predictions: that genome interpretation may eventually help clarify our risk for some autoimmune (such as type 1 diabetes and thyroid disease) and brain diseases (such as Alzheimer’s) particularly well. This includes the often overlooked question of identifying diseases for which someone may be at unusually low risk. In time, we'll see how these predictions fare, and whether readily genetically forecastable diseases tend to have particular physiological profiles.

    Moreover, the paper acknowledges that whole-genome sequencing can help us spot strong risk for rare, serious diseases that may lurk in our genomes. This may be especially useful as we plan families with our spouses, so we can know what rare but potentially shared risk variants might be catastrophic if inherited together by one of our kids.

    4. Whole-genome interpretation presumes it’s not all in your genes.

    Responding to Paolo's "absolute information" comment, I think any notion that this paper debunks genetic determinism is a straw man. We have long known that genetic risk is not immutably deterministic. In fact, the endeavor of genomically personalized medicine is founded on that very point. That is, in trying to understand genetic risk, we specifically hope to learn to mitigate it, by controlling the environment of our habits: what we eat, what drugs and other treatments we take, and how we otherwise live our lives. Our genomes may be able to tell us a fair bit on those fronts.

    5. Whole-genome sequencing is already helping greatly in cancers.

    Some of the first clear examples of how useful genome sequencing can be are in familial cancers, such as BRCA-associated breast and ovarian cancers. Tumor sequencing, which the paper doesn’t address, is revolutionizing how cancer is treated, by finding key changes to the genomes of particular cells in the body that let them grow out of control. Such sequencing is reshaping how oncologists think of cancers from a simplistic tissue-specific view to one that highlights recurrent variants shared by tumors in different people and different tissues, which may nonetheless represent druggable targets.

    6. Like all twin studies, this one is dogged by some minor concerns.

    As the authors acknowledge, their work presumes that European monozygotic twins reasonably represent everyone, that is, that their assertions

    • will generalize to other ethnic groups;
    • aren’t confounded by ascertainment bias: "Doc, my twin has disease X. Do I?" This might tend to overestimate heritability if one twin's anchoring diagnosis means the other twin is more often or better diagnosed, or underestimate it if one twin, on seeing the other get sick, takes more preventive measures than (s)he otherwise would;
    • aren’t distorted by monozygotic twins’ distinctive health profiles. Factors include withstanding resource-constrained development in the womb, potentially lower-than-average birth weight, unusual profiles of maternal age or genotype, any subtle quirks of early embryonic cell division, the effects of lifelong social support from having a very similar sibling, etc.

    The last few concerns are unlikely damning. After all, as noted, the basic findings of the paper boil down to the important and indisputable observation that monozygotic twins don't get the same diseases. But, like all the foregoing, they are worth keeping in mind as research into the causes of disease—genetic and otherwise—continues on all our behalf.

  5. Of course DNA is not immutable destiny. The results of this study should not have surprised anyone, considering the complexities in the regulation of gene expression. The authors have done a great service by conducting a comprehensive study that emphasizes this.

    Indeed, the influence of the environment on the epigenetic regulation of the expression of a disease phenotype has already been shown by us in a study of identical twins discordant for Alzheimer's disease (Mastroeni et al., 2009). Nevertheless, there do exist genetic sequences that are highly predictive of disease phenotype, and it is important to distinguish situations in which genes are destiny and those in which they are not.


    . Epigenetic differences in cortical neurons from a pair of monozygotic twins discordant for Alzheimer's disease. PLoS One. 2009;4(8):e6617. PubMed.

  6. The analysis by an esteemed group of genome and cancer scientists at
    Johns Hopkins takes a novel approach to an important issue: Now that we
    are on the verge of being able to sequence inexpensively any person's
    entire DNA sequence, what will the information mean? Certainly, people
    will learn about variations in their genome that might cause a rare
    disease in them, or predispose an offspring to a rare disease. Genome
    sequencing will also provide knowledge of which alleles a person has at
    the apolipoprotein E (ApoE) gene, a common variant that, when present in one
    copy, increases the risk of Alzheimer's disease around fourfold, and when
    present in two copies, increases the risk several times more, depending
    on ethnicity.

    But the Johns Hopkins study asks a deeper question—about the
    ability to predict risk for common diseases such as various cancers,
    diabetes, and heart disease. Based on data that have been around for
    decades on the likelihood that two identical (monozygotic) twins will
    develop the same common disease, this new study calculates that simply
    knowing the sequence will add little to what can be learned from the
    family history alone.

    This result certainly gives pause to those who
    have predicted that whole-genome sequencing will revolutionize medical
    care, and be the foundation of "personalized medicine." Hopefully,
    people already have their healthcare "personalized" by their
    physicians, and DNA analysis, selectively applied, can certainly have an
    important role. But for the present, having your entire 3.2 billion-nucleotide DNA sequence on a flash drive is not something you need to have in your medical record.

Make a Comment

To make a comment you must login or register.


Paper Citations

  1. . Heritability for Alzheimer's disease: the study of dementia in Swedish twins. J Gerontol A Biol Sci Med Sci. 1997 Mar;52(2):M117-25. PubMed.
  2. . Parkinson disease in twins: an etiologic study. JAMA. 1999 Jan 27;281(4):341-6. PubMed.
  3. . Predictive testing for complex diseases using multiple genes: fact or fiction?. Genet Med. 2006 Jul;8(7):395-400. PubMed.
  4. . The genetic interpretation of area under the ROC curve in genomic profiling. PLoS Genet. 2010 Feb;6(2):e1000864. PubMed.

Further Reading


  1. . Epigenetic programming of neurodegenerative diseases by an adverse environment. Brain Res. 2012 Mar 20;1444:96-111. PubMed.
  2. . Introduction to induced pluripotent stem cells: advancing the potential for personalized medicine. World Neurosurg. 2011 Sep-Oct;76(3-4):270-5. PubMed.
  3. . Genetic background of patients from a university medical center in Manhattan: implications for personalized medicine. PLoS One. 2011;6(5):e19166. PubMed.
  4. . The 1000 Genomes Project: new opportunities for research and social challenges. Genome Med. 2010;2(1):3. PubMed.
  5. . Personal genome sequencing: current approaches and challenges. Genes Dev. 2010 Mar 1;24(5):423-31. PubMed.
  6. . Personal omics profiling reveals dynamic molecular and medical phenotypes. Cell. 2012 Mar 16;148(6):1293-307. PubMed.

Primary Papers

  1. . The Predictive Capacity of Personal Genome Sequencing. Sci Transl Med. 2012 Apr 2; PubMed.