. Evidence that APP gene copy number changes reflect recombinant vector contamination. bioRxiv. October 31, 2019


Please login to recommend the paper.


  1. Based on a reanalysis of the raw data published by Lee et al., 2018, a recent study by Kim et al. raised several concerns, including contamination of the sequencing library by a recombinant vector carrying an insert of APP. In their rebuttal, Lee et al. agreed with the plasmid contamination in a single pull-down dataset and debated that it could not account for all APP exon::exon junctions. Of note, Lee et al. could not determine which APP exon::exon reads were due to genomic complementary DNAs (gencDNAs) versus plasmid contamination, but presented the additional data of novel APP gencDNA insertion sites, which cannot be explained by the contamination.

    Kim et al. also challenged the replication study of Park et al. (2019), by presenting evidence of genome-wide mouse and human mRNA contamination in their dataset. It would be critically important to publish the rebuttal by Park and colleagues.

    The painful question is if it is possible to detect every imperfection, using such complicated modern technology as single-cell whole-genome sequencing. This is why a gold standard would be to conduct replication studies of APP somatic gene recombination in independent laboratories with the inclusion of additional measures to control for potential artifacts.

    Notably, the results of the independent single-cell whole-genome sequencing data conducted by Kim et al. did not provide evidence for somatic APP retrotransposition in neurons.


    . Somatic APP gene recombination in Alzheimer's disease and normal neurons. Nature. 2018 Nov;563(7733):639-645. Epub 2018 Nov 21 PubMed.

    . Brain somatic mutations observed in Alzheimer's disease associated with aging and dysregulation of tau phosphorylation. Nat Commun. 2019 Jul 12;10(1):3090. PubMed.

    View all comments by Ekaterina Rogaeva
  2. I find the case made by Kim et al. to be convincing.

    Mutation detection is a difficult job. Usually, mutations are extremely rare and, as a result, difficult to find. My lab, as well as the labs of my mentors, Jason Bielas at the Fred Hutchinson Cancer Research Center and Lawrence Loeb at the University of Washington in Seattle, have spent our careers coming up with innovative methods to detect rare events. The tools we have designed and modified over the years can now detect one mutated base among 100 million wild-type bases. What is shared among all these tools is the need to overcome the inherent limitations of PCR and RT-PCR reactions. In a way, that is the sole goal of every tool we design.

    Not only are PCR and RT-PCR error-prone, leading to countless artifacts, they also are mistaken for simple reactions because they are so common and contain few ingredients. An undergrad with a few days’ lab experience can perform PCR and RT-PCR. However, this simplicity is an illusion. It is impossible to predict what is happening in a PCR reaction, because so many things are happening simultaneously. Even though primers are designed to anneal to their intended targets, they actually anneal to thousands of slightly degenerate locations across the genome as well. Usually, these locations are so far apart that they do not result in a PCR product. That does not mean, though, that no molecules are produced. Each primer that lands can be extended and will generate a copy of a local region.

    Usually, this type of molecular noise does not affect the reaction because the intended target sequences are more efficiently used. But that is not the case in single-nucleus, or near single-nucleus experiments. In the first PCR cycle, the primers are in great excess to the intended target and are more likely to anneal to slightly degenerate sequences as a result, creating, in addition to the intended copy, a large number of additional single molecules from many other locations. These molecules now contain a unique DNA sequence plus a perfect complement to one of the primers. These molecules can subsequently be used as primers for additional reactions, which are normally of no concern, but again, because the target sequence is near single-molecule level as well, can result in disastrous consequences. That is especially true if no actual PCR product is supposed to be generated, because the primers are searching for an amplicon that shouldn't exist, as is the case in the Lee et al. study.

    Contamination is another problem with single-molecule, or rare-molecule amplification. I've amplified thousands of single molecules over the course of my career and know how difficult it can be. Contamination can come from anywhere. The result of each PCR is a tube that contains billions of a target molecule that can ruin weeks of single-molecule experiments in the future. Therefore I am not surprised that contamination might be a problem in these experiments. I remember the frustration of cleaning benches meticulously between experiments, and the constant vigilance needed to prevent contamination from wreaking havoc. It was never enough, though. For example, once, I found in a set of samples an interesting mutation that I could not place. After two weeks of testing and retesting, I discovered that the single-molecule mutations I was detecting were not mutations at all. They were bovine sequences. I was amplifying mouse DNA, which differs from bovine DNA by a few bases at the location that was being interrogated. Although nobody remembered, someone three benches over had been working with bovine DNA and it had somehow contaminated my reactions. 

    The problems pointed out by Kim et al. are consistent with what I have seen in many labs across the world that are working on the detection of rare events. Lee et al.’s rebuttal was unable to convince me otherwise so far. For example, the rebuttal argues that five out seven samples did not show mouse RNA contamination; this is not a statement that inspires much confidence. Especially since the problem of human RNA contamination does not seem to be accurately addressed. Human or mouse RNA could be reverse-transcribed anywhere in the lab or at the sequencing facility and make its way into a reaction, resulting in artifacts that can be difficult to explain. 

    I place more value on experiments that do not require PCR amplification, including SMRT-seq. 

    In their rebuttal, Lee et al. highlight new data showing insertions at distal sites in the genome. Unfortunately, it seems as though those sites were only found by tools that involved PCR, not SMRT-seq.

    If an experiment could convince me that the gencDNA phenomenon is real it would be a long, multi-kb read from non-amplified DNA that demonstrates a cDNA copy of the APP gene inserted in a different location across the genome. In my opinion, DISH and RISH experiments are too prone to artifacts to be conclusive evidence. It also seems that the rebuttal addresses some but not all of the questions posed by Kim et al.

    To be frank, I find it difficult to believe that this phenomenon exists, or that it is common. Our cells safeguard our genome every way they can. To do so, they make use of multiple forms of base excision repair, nucleotide excision repair, mismatch repair, homologous recombination, non-homologous end joining, and various other tools. All these pathways, each of which consist of numerous proteins, have but one goal: to keep our genome safe. Our cells would rather go into senescence, or kill themselves, than risk mutation. And they do an amazing job at that. A cell can prevent a single base from being changed among more than 10 million other WT bases. It is remarkable. So I'm naturally skeptical of the idea that a more disruptive event, such as gencDNA, would be so common. As a mutation researcher, it doesn't sound right to me. That's not how our cells treat their DNA.

    Although they have already tried numerous techniques, I recommend the authors try mutation-detection tools that are unusually sensitive or resistant to PCR artifacts. These include the random-mutation capture tool, duplex sequencing, or Cypher-seq. If this phenomenon is real and abundant, these tools will provide unambiguous answers.

    In addition, I would perform these experiments not just on old individuals, but also on young controls and on neurons derived from a single iPSC clone with a known genome. By starting with a single cell, you ensure that all cells will have approximately the same genome, and any differences with the reference genome can easily be checked with multiple genetic-analysis tools on different but closely related cells from that clone.


    View all comments by Marc Vermulst

Make a Comment

To make a comment you must login or register.

This paper appears in the following:


  1. Rogue APP Claim Embroiled in Contamination Concerns