It sounded wild at the time: A Nature paper uncovered multiple copies of APP inserted into neuronal genomes, especially among people with sporadic AD. These interlopers purportedly derived from APP mRNA transcripts, which had been reverse-transcribed and plopped back into the genome—some now burdened with pathogenic mutations (Nov 2018 news). The study was led by Jerold Chun at Sanford Burnham Prebys Medical Discovery Institute in La Jolla, California. It seemed to offer a genetic explanation for some cases of Alzheimer’s where no family pattern was known. Alas, it came under scrutiny in private conversations among fellow geneticists soon after it appeared and, last October, scientists led by Eunjung Alice Lee and Christopher Walsh of Boston Children’s Hospital posted a manuscript on bioRxiv that uncovered contamination in some of the data, most notably from a cloning vector carrying an APP insert (see Kim et al., 2019). 

  • A 2018 study found mutated copies of APP inserted into the neuronal genomes of people with sporadic AD.
  • Another group mounted a challenge, ascribing those findings to contamination.
  • The original authors defend their claim with new data.

The story does not end there. In a response posted on July 9, 2020, Chun’s group acknowledged the contamination, but argued that other data in the 2018 paper still support their original findings (Lee et al., 2020). They also provide new data revealing copies of APP integrated into specific sites within the genome of neurons from people with sporadic AD. Nature will publish the manuscripts as Matters Arising articles.

Genetic mosaicism—the phenomenon in which somatic changes in individual cells give rise to a landscape of unique genomes in a given person—is rampant within the brain and increases with age. Chun’s lab previously reported that the APP gene was prone to mosaicism, manifested by extra copies of the gene found within neurons, and more so in the brains of people with sporadic AD (Jul 2015 news; Rohrback et al., 2018). 

In the 2018 paper, first author Ming-Hsiang Lee and colleagues dove deeper into this. Using PCR and sequencing techniques to hunt for APP genes within neurons isolated from postmortem human brain samples, the researchers spotted extra copies of APP that differed markedly from the same person’s endogenous, full-length gene. The extras contained various combinations of exons, but no introns. Some even had familial AD mutations. Using DNA in situ hybridization (DISH), the researchers spotted multiple copies of the APP gene on different chromosomes within neurons, suggesting the gene had moved beyond its normal stomping ground on chromosome 21. Notably, the phenomenon occurred about 10-fold more frequently in sporadic AD brains than in controls. Further experiments suggested a mechanism whereby APP mRNA transcripts were reverse-transcribed into cDNA, which then integrated back into the genome. The scientists dubbed these species genomic cDNAs, or “gencDNAs.” At the time, other researchers cautioned that some of the techniques used in the paper were particularly prone to contamination.

Separately, first author Junho Kim and colleagues at Boston Children’s found no evidence of APP insertions within their own genomic data. To investigate, Kim et al. uncovered multiple forms of contamination within some of the sequencing data from Chun’s paper. First and foremost, these authors looked at Chun’s APP-targeted sequencing data, in which the Burnham researchers had used APP-specific primers to amplify APP genes from neuronal genomes, then sequenced the amplified fragments. Kim et al. found multiple instances of “cDNA supporting reads,” i.e., APP exon-exon junctions suggesting introns had been spliced out, but they found no instances of an APP sequence joined to a specific integration site within the human genome. Instead, they found multiple so-called “clipped reads” at both ends of the APP coding sequence that contained the multiple cloning site of Promega’s pGEM-T Easy Vector. Indeed, Chun confirmed that the researchers had used this vector, complete with an APP cDNA insert, in a previous study.

Kim et al. went on to analyze data from Chun’s study that had been garnered from a non-PCR-based method called hybrid-capture sequencing. There, too, the pGEM vector popped up. The researchers concluded that vector contamination, as opposed to APP gencDNA, could explain the results from that dataset.

Furthermore, Kim investigated whole-exome sequencing data from another paper, led by Jun Sung Park and Junehawk Lee of the Korea Advanced Institute of Science and Technology in Daejeon, that had identified APP somatic insertions subsequent to Chun’s paper (Park et al., 2019). In that data, they found evidence supporting yet another form of contamination, that is, genome-wide mRNA.

To be clear, the Boston Children’s scientists did not identify specific mRNAs contaminating the genomic DNA samples. Rather, they identified myriad instances of cDNA supporting reads for thousands of genes, far more than would be expected, leading Kim to suspect that contaminants were to blame. Furthermore, they found that seven of the 12 extra copies of APP identified in Park et al. hailed from mice. The source of any mRNA contamination is unclear, as Park et al. had not worked with mRNA; however, Kim suggested that the mRNA could have derived from the sequencing facility.

The Boston researchers also found indications that a third form of contamination—from nested APP PCR products—may have muddled results from another set of experiments.

Finally, Kim et al. ran some of their own experiments, in which they performed single-cell whole genome sequencing on neurons isolated from seven postmortem AD brains. In all, they sequenced the genomes of nine to 13 neurons from each brain. They did not identify any instances of extra copies of APP. However, they did find evidence of two polymorphic germline pseudogenes—SKA3 and ZNF100—in three of the samples. This suggested to them that if APP somatic insertions occurred at the rate claimed by Chun, they would have observed them among the neurons they sampled. However, Walsh acknowledged that the negative result cannot prove the absence of such insertions.

In their bioRxiv rebuttal, first author Ming-Hsiang Lee and colleagues acknowledged the cloning vector contamination in one dataset. However, the Burnham scientists noted that APP somatic insertions were not detected in non-neuronal cells from the same patients. This implied the phenomenon was specific to neurons, and if contamination explained all of their results, it would have been detected across all cell types. They emphasized that people with AD had far more APP insertions than control brains, which would not be the case if contamination entirely accounted for the results.

The Burnham researchers also reported detection of APP somatic insertions in new AD brain samples that they verified to be free of vector contamination. Using whole-exome sequencing data from hippocampal neurons from five AD brain samples and two blood samples from the Park lab, the Burnham researchers uncovered multiple APP reads with exon-exon junctions. Some of these included the 5' or 3' untranslated regions (UTR) of APP, which are not included in cloning vectors.

Additionally, the researchers this time mapped the APP gencDNAs to 10 specific insertion sites on chromosomes 1, 3, 9, 10, and 12. The genomic samples used in this subsequent analysis were the same ones that the Boston Children’s scientists had suggested were contaminated with mRNA, but even so, Chun countered that if mRNA contamination had occurred, it still would not explain the new data identifying APP sequences integrated at specific insertion sites in the genome.

Rogue APP? Ten APP insertion sites were identified on multiple chromosomes. In (a), DNA reads spanned APP UTRs and novel chromosomal insertion sites. In (b), one read spanned an APP exon::exon junction, and the other mapped to a novel chromosome insertion site. [Courtesy of Lee et al., bioRxiv, 2020.]

In an email to Alzforum, Alice Lee wrote that she and Walsh had reason to believe that these APP insertions were not genuine insertions, but rather chimeric molecules formed during library preparation or sequencing procedures. They did not attempt to publish this analysis.

Chun believes other lines of evidence, most notably DNA in situ hybridization, support his original conclusion that APP gencDNAs have integrated on other chromosomes. Contamination would not explain how 11 familial AD mutations had been detected amongst the APP gencDNAs, he said. Chun confirmed that his lab had not previously worked with vectors carrying APP with FAD mutations. He attributes the presence of these and other APP mutations to error-prone RNA transcriptase, the enzyme needed to convert mRNA to cDNA.

Walsh said he is skeptical about the use of DISH—a technique pioneered by Chun—to prove the presence of gencDNAs. Walsh and Lee believe that contamination in the most prominent sequencing experiments negates positive results from other methods.

Ekaterina Rogaeva of the University of Toronto found the contamination of the original data concerning, but agrees that the new data revealing insertion sites cannot be explained by contamination. “A gold standard would be to conduct replication studies of APP somatic gene recombination in independent laboratories, with the inclusion of additional measures to control for potential artifacts,” she wrote.

John Hardy of University College London wrote, “My a priori view is that somatic APP duplications are a very unlikely cause of Alzheimer’s disease, and I would require a high degree of proof to convince me otherwise. This burden of proof has not been reached.”—Jessica Shugart


  1. Based on a reanalysis of the raw data published by Lee et al., 2018, a recent study by Kim et al. raised several concerns, including contamination of the sequencing library by a recombinant vector carrying an insert of APP. In their rebuttal, Lee et al. agreed with the plasmid contamination in a single pull-down dataset and debated that it could not account for all APP exon::exon junctions. Of note, Lee et al. could not determine which APP exon::exon reads were due to genomic complementary DNAs (gencDNAs) versus plasmid contamination, but presented the additional data of novel APP gencDNA insertion sites, which cannot be explained by the contamination.

    Kim et al. also challenged the replication study of Park et al. (2019), by presenting evidence of genome-wide mouse and human mRNA contamination in their dataset. It would be critically important to publish the rebuttal by Park and colleagues.

    The painful question is if it is possible to detect every imperfection, using such complicated modern technology as single-cell whole-genome sequencing. This is why a gold standard would be to conduct replication studies of APP somatic gene recombination in independent laboratories with the inclusion of additional measures to control for potential artifacts.

    Notably, the results of the independent single-cell whole-genome sequencing data conducted by Kim et al. did not provide evidence for somatic APP retrotransposition in neurons.


    . Somatic APP gene recombination in Alzheimer's disease and normal neurons. Nature. 2018 Nov;563(7733):639-645. Epub 2018 Nov 21 PubMed.

    . Brain somatic mutations observed in Alzheimer's disease associated with aging and dysregulation of tau phosphorylation. Nat Commun. 2019 Jul 12;10(1):3090. PubMed.

  2. I find the case made by Kim et al. to be convincing.

    Mutation detection is a difficult job. Usually, mutations are extremely rare and, as a result, difficult to find. My lab, as well as the labs of my mentors, Jason Bielas at the Fred Hutchinson Cancer Research Center and Lawrence Loeb at the University of Washington in Seattle, have spent our careers coming up with innovative methods to detect rare events. The tools we have designed and modified over the years can now detect one mutated base among 100 million wild-type bases. What is shared among all these tools is the need to overcome the inherent limitations of PCR and RT-PCR reactions. In a way, that is the sole goal of every tool we design.

    Not only are PCR and RT-PCR error-prone, leading to countless artifacts, they also are mistaken for simple reactions because they are so common and contain few ingredients. An undergrad with a few days’ lab experience can perform PCR and RT-PCR. However, this simplicity is an illusion. It is impossible to predict what is happening in a PCR reaction, because so many things are happening simultaneously. Even though primers are designed to anneal to their intended targets, they actually anneal to thousands of slightly degenerate locations across the genome as well. Usually, these locations are so far apart that they do not result in a PCR product. That does not mean, though, that no molecules are produced. Each primer that lands can be extended and will generate a copy of a local region.

    Usually, this type of molecular noise does not affect the reaction because the intended target sequences are more efficiently used. But that is not the case in single-nucleus, or near single-nucleus experiments. In the first PCR cycle, the primers are in great excess to the intended target and are more likely to anneal to slightly degenerate sequences as a result, creating, in addition to the intended copy, a large number of additional single molecules from many other locations. These molecules now contain a unique DNA sequence plus a perfect complement to one of the primers. These molecules can subsequently be used as primers for additional reactions, which are normally of no concern, but again, because the target sequence is near single-molecule level as well, can result in disastrous consequences. That is especially true if no actual PCR product is supposed to be generated, because the primers are searching for an amplicon that shouldn't exist, as is the case in the Lee et al. study.

    Contamination is another problem with single-molecule, or rare-molecule amplification. I've amplified thousands of single molecules over the course of my career and know how difficult it can be. Contamination can come from anywhere. The result of each PCR is a tube that contains billions of a target molecule that can ruin weeks of single-molecule experiments in the future. Therefore I am not surprised that contamination might be a problem in these experiments. I remember the frustration of cleaning benches meticulously between experiments, and the constant vigilance needed to prevent contamination from wreaking havoc. It was never enough, though. For example, once, I found in a set of samples an interesting mutation that I could not place. After two weeks of testing and retesting, I discovered that the single-molecule mutations I was detecting were not mutations at all. They were bovine sequences. I was amplifying mouse DNA, which differs from bovine DNA by a few bases at the location that was being interrogated. Although nobody remembered, someone three benches over had been working with bovine DNA and it had somehow contaminated my reactions. 

    The problems pointed out by Kim et al. are consistent with what I have seen in many labs across the world that are working on the detection of rare events. Lee et al.’s rebuttal was unable to convince me otherwise so far. For example, the rebuttal argues that five out seven samples did not show mouse RNA contamination; this is not a statement that inspires much confidence. Especially since the problem of human RNA contamination does not seem to be accurately addressed. Human or mouse RNA could be reverse-transcribed anywhere in the lab or at the sequencing facility and make its way into a reaction, resulting in artifacts that can be difficult to explain. 

    I place more value on experiments that do not require PCR amplification, including SMRT-seq. 

    In their rebuttal, Lee et al. highlight new data showing insertions at distal sites in the genome. Unfortunately, it seems as though those sites were only found by tools that involved PCR, not SMRT-seq.

    If an experiment could convince me that the gencDNA phenomenon is real it would be a long, multi-kb read from non-amplified DNA that demonstrates a cDNA copy of the APP gene inserted in a different location across the genome. In my opinion, DISH and RISH experiments are too prone to artifacts to be conclusive evidence. It also seems that the rebuttal addresses some but not all of the questions posed by Kim et al.

    To be frank, I find it difficult to believe that this phenomenon exists, or that it is common. Our cells safeguard our genome every way they can. To do so, they make use of multiple forms of base excision repair, nucleotide excision repair, mismatch repair, homologous recombination, non-homologous end joining, and various other tools. All these pathways, each of which consist of numerous proteins, have but one goal: to keep our genome safe. Our cells would rather go into senescence, or kill themselves, than risk mutation. And they do an amazing job at that. A cell can prevent a single base from being changed among more than 10 million other WT bases. It is remarkable. So I'm naturally skeptical of the idea that a more disruptive event, such as gencDNA, would be so common. As a mutation researcher, it doesn't sound right to me. That's not how our cells treat their DNA.

    Although they have already tried numerous techniques, I recommend the authors try mutation-detection tools that are unusually sensitive or resistant to PCR artifacts. These include the random-mutation capture tool, duplex sequencing, or Cypher-seq. If this phenomenon is real and abundant, these tools will provide unambiguous answers.

    In addition, I would perform these experiments not just on old individuals, but also on young controls and on neurons derived from a single iPSC clone with a known genome. By starting with a single cell, you ensure that all cells will have approximately the same genome, and any differences with the reference genome can easily be checked with multiple genetic-analysis tools on different but closely related cells from that clone.


Make a Comment

To make a comment you must login or register.


News Citations

  1. Could Rogue APP Variants Invade Genome of Individual Neurons?
  2. Could Genetic Mosaicism in Adult Neurons Precipitate Disease?

Paper Citations

  1. . Genomic mosaicism in the developing and adult brain. Dev Neurobiol. 2018 Nov;78(11):1026-1048. Epub 2018 Aug 1 PubMed.
  2. . Brain somatic mutations observed in Alzheimer's disease associated with aging and dysregulation of tau phosphorylation. Nat Commun. 2019 Jul 12;10(1):3090. PubMed.

External Citations

  1. Kim et al., 2019
  2. Lee et al., 2020

Further Reading

No Available Further Reading

Primary Papers

  1. . Evidence that APP gene copy number changes reflect recombinant vector contamination. bioRxiv. October 31, 2019
  2. . Reply: Evidence that APP gene copy number changes reflect recombinant vector contamination. bioRxiv. July 9, 2020