As the mountain of single-nucleus RNA-sequencing data grows taller, how can scientists extract meaning from it? One way is pseudotime analysis. In essence, this algorithm orders cells on a virtual timeline based on the similarity of their gene-expression patterns. “Cells that are alike are placed near each other along the spectrum of transcriptional changes,” explained Laura Heath of Sage Bionetworks in Seattle. Heath presented one of several pseudotime analyses currently being done in the Alzheimer's field at the Alzheimer's Association International Conference, held last month in San Diego, California.

  • "Later" pseudotime correlates with worse neuropathology, rise in disease-associated glia.
  • Pseudotime says microglia change before astrocytes do during AD pathogenesis.
  • Relating pseudotime to multi-omics data illuminates pathways of progression.

The resulting diagrams look like trees. Scientists call branches healthy or diseased based on their expression of known markers. This, in turn, places each cell along the health to disease trajectory, exposing sequential gene expression patterns.

Pseudotime analysis allows scientists to turn cross-sectional data into “faux” longitudinal data to understand how cells change over time. This is important for Alzheimer’s, a disease that unfolds over the course of 30 years. Postmortem tissue offers but a snapshot of one time point, making it hard to discern when and how disease markers develop. Most brain transcriptomic data come from postmortem samples and are likewise difficult to interpret because it is hard to know if gene-expression changes are due to AD pathogenesis or organ damage associated with the end of life.

“Pseudotime trajectories offer a computational approach to model [transcriptomic changes], which can serve as a starting point for more detailed studies,” wrote Gregory Carter, Jackson Lab, Bar Harbor, Maine. Maria Wörheide agreed. She works at the Helmholtz Center Institute of Bioinformatics and Systems Biology in Munich. “Manifold learning algorithms, such as pseudotime, applied to cross-sectional data, have shown potential to provide novel insights into AD, although their robustness and scalability will require further investigation,” she wrote (full comments below).

At AAIC, four scientists showed how they use pseudotime analysis to wrangle RNA-Seq data. Two followed transcriptomic changes in astrocytes, describing a continuum of homeostatic to reactive cells in both healthy aging and AD. One plotted entire prefrontal cortex transcriptomes onto pseudotime trees, and one tied transcriptomic pseudotime to metabolomic changes in the AD brain.

Astrocyte Continuum
As a test of the methodology, Heath and colleagues first performed a pseudotime analysis on bulk RNA-Seq data from postmortem brain tissue of healthy and AD cases in the Religious Orders Study and Memory and Aging Project (ROSMAP) cohort (Mukherjee et al., 2020). “The modeled trajectory beautifully recapitulated neuropathology and clinical disease states, such that control samples were near the beginning and AD samples at the end,” she explained at AAIC.

Next, Heath used the algorithm on published single-nucleus RNA-Seq data from 80,500 prefrontal cortex cells of 48 ROSMAP participants, half controls and half AD, as well as 1.2 million cells from the middle temporal gyrus of 84 people ranging from healthy to AD in the new Seattle Alzheimer’s Disease Brain Cell Atlas (May 2019 news; see Part 17 of this series).

Aiming to draw pseudotime trajectories for each cell type, Heath first focused on astrocytes. She collected the transcriptomes of 3,400 astrocytes from the 48 ROSMAP participants and 47,000 astrocytes from nine controls and 47 AD cases in SEA-AD. Heath said she didn’t analyze all 84 SEA-AD participants to avoid bogging down the algorithm with too much data.

Heath identified about 3,000 differentially expressed genes in astrocytes from AD samples compared to controls, then plugged these DEGs into the pseudotime algorithm to generate proxy disease trajectories for both datasets. “Late” pseudotime correlated with a high degree of AD pathology via Braak and CERAD scores, though only in the ROSMAP dataset. Despite its much larger size, the SEA-AD data contained many more AD cases than controls, and Heath believes this might have obscured the initial branching off of a temporal pattern.

That said, astrocytic changes were consistent in both datasets. As pseudotime “went by,” the glia progressed through six distinct phenotypes (see image below). Heath called the first homeostatic and the sixth reactive because the former highly expressed genes essential to astrocytic function, such as APOE, clusterin, and glutamine synthetase-encoding GluL, while the latter barely expressed those genes. Also, astrocytes from three SEA-AD participants without AD pathology, men aged 29, 42, and 50, matched the first group, supporting the categorization.

Sprouting Subtypes. Pseudotime trajectories (left to right) of astrocyte transcriptomes (dots) from ROSMAP (top) and SEA-AD (bottom) datasets. Colors denote six states; the first is called homeostatic (red circle). [Courtesy of Laura Heath, Sage Bionetworks]

Notably, astrocytes from controls and AD cases seemed much the same. Though controls had slightly more homeostatic astrocytes and cases had a few more reactive ones, each participant had astrocytes in all six states at widely varying proportions (see image below). At first, this surprised Heath. “Given how essential astrocytes are to maintaining neuronal health, and how responsive they are to all kinds of signals occurring during aging, there must be a need for multiple types of reactive astrocytes in all or most aging brains regardless of overt neuropathology,” she reasoned.

Everyone Has Every Type. All six astrocyte subpopulations (colors) were present in varying proportions among controls (left) and AD cases with low (middle) and high (right) pathology. [Courtesy of Laura Heath, Sage Bionetworks]

Sudeshna Das, Massachusetts General Hospital, Charlestown, reinforced Heath’s findings. Her pseudotime analysis also rendered a continuous spectrum of change from homeostatic to reactive astrocytes. Her MGH colleagues, in collaboration with Abbvie Inc., sequenced single nuclei from the prefrontal, entorhinal, visual cortex, and inferior temporal gyrus of 32 participants from the Massachusetts Alzheimer’s Disease Research Center. Averaging 80 years old, they ranged from Braak stages 0 to VI; controls were defined as Braak 0, I, or II without amyloid plaques, intermediates as Braak II or III with plaques, and AD cases as Braak V or VI with plaques. The scientists did not present the pre-mortem clinical diagnosis in their study, only the neuropathology data.

About those “intermediate” astrocytes. Do they represent cells in a continuum between homeostatic and reactive? Das ran a pseudotime analysis organizing the cells from the former to the latter. Six clusters of genes with similar expression patterns appeared: two whose expression rose together from homeostatic to reactive, one whose expression fell, and three whose expression peaked somewhere in between. “This suggests that the different astrocyte subpopulations may not be specialized cells, but rather transcriptional states in a trajectory from homeostatic to reactive,” Das said. Again, Heath agreed, noting that she sees similar intermediate astrocyte clusters in her data.

Grouping People, Not Cells
Gilad Green, Hebrew University of Jerusalem, took a different approach. He ran pseudotime analyses on the combined transcriptomes of all brain cells from each participant. Heath noted that pseudotime modeling is flexible enough to work on noisy data like that.

First, Green created a sizable snRNA-Seq database of 1.6 million cells from prefrontal cortex tissue of 478 ROSMAP participants and defined 96 distinct cell populations (see Part 17). Then he calculated the proportion of each cell population for each person, combined them into one composite value, and used a pseudotime algorithm to plot each value based on how similar it is to others. This created a forked trajectory, much like Heath’s above, where each data point represents a person, rather than a cell. Green declined Alzforum's request to share a representative image.

In Green’s trajectory, a single mass of points, which he believes represent homeostatic cells, diverged into two, presumably disease-related, paths. As pseudotime “passed” in each path, the proportion of homeostatic glia decreased in each person, just as Heath and Das had found. One path became enriched with reactive, GFAP-positive astrocytes, the other with disease-associated microglia (DAM) and disease-associated astrocytes (DAA). Green named these cells in this way because of their strong upregulation in the presence of amyloid plaques and neurofibrillary tangles.

To relate these paths to AD, Green consulted neuropathological and clinical data. He matched each person's degree of amyloid or tau pathology, as determined by immunohistochemistry, and their rate of decline on the ROSMAP cognitive composite, with their placement on the pseudotime paths.

People situated on the DAM/DAA path had more plaques and tangles, and faster slippage, than those on the reactive astrocyte path. The farther along in pseudotime a person was, the worse their neuropathology and cognition had been. Moreover, the proportion of disease-associated microglia was highest at “early” pseudotime, while that of disease-associated astrocytes was highest at “later” pseudotime. This aligned with Green’s previous data that a “DAM” microglial response precedes “DAA” astrocytes (see Part 17). He concluded that this path modeled AD progression.

Then what is the reactive astrocyte branch? Green assumes it is not normal aging, as people on it comprise those with AD and a wide range of pathologies. He hypothesizes that it may be people with slow progression or mixed dementia.

To Pseudotime and Beyond
If pseudotime is not sci-fi enough, watch Wörheide take exploration of omics space a step further. Wörheide used Heath’s published pseudotime analysis of bulk RNA-Seq and related it to metabolic change. This identified how metabolites link up with disease progression.

Wörheide analyzed mass-spectrometry concentrations of 667 metabolites from prefrontal cortex tissue of 154 ROSMAP participants in Heath’s RNA-Seq pseudotime analysis. The metabolites ranged from lipids and carbohydrates to amino acids and nucleotides.

Because “later” pseudotime meant worse AD, Wörheide correlated pseudotime to metabolite level. The concentration of 89 molecules rose or fell in lockstep with pseudotime. The majority, 36, were amino acids and their metabolites, followed by 17 types of lipid and 10 nucleotides and their metabolites.

Wörheide then correlated each metabolite to data in the AD Atlas, a database her group created from genomic, transcriptomic, proteomic, metabolomic, and clinical data from ROSMAP (Wörheide et al., 2021). This atlas boasts such AD phenotyping on more than 20,000 protein-coding genes, 8,000 proteins, and nearly 1,000 metabolites.

Of the 89 statistically significant metabolites, the AD Atlas already contained 50. Thirty-four correlated with 619 genes mapped onto pathways such as amino acid metabolism and neurotransmitter transport. Wörheide then searched the atlas's transcriptomic and proteomic data for differential expression of those genes and their resulting proteins. She dredged up 193 DEGs and 39 differentially expressed proteins in AD cases versus controls. Differences in transcription were biggest in the temporal cortex.

Twenty-seven of these 34 AD-linked metabolites have already been correlated with plaque and tangle pathology, brain glucose uptake, or cognition (Batra et al., 2022). All in all, Wörheide believes that relating pseudotime to other omics data can shed new light on pathways that lead to AD dementia.

Heath agreed. “It is exciting when you see a convergence of signals toward a similar pathway among the different data types, because it strengthens the biological relevance of the pathway,” she said.—Chelsea Weidman Burke

Comments

  1. Brain samples are necessarily limited to postmortem tissues in Alzheimer’s disease research, which makes it difficult to determine the timing and progression of molecular markers of disease. Pseudotime methods can begin to model this progression by assuming that the data represent a sampling of individuals at multiple states of disease, thereby generating hypotheses of molecular progression.

    Pseudotime methods have become widely applied to single-cell RNA-Seq data; for example, to infer the lineages of differentiating cells in culture. This has proven very effective when the full lineage is represented in a sample. Applying the same idea to primary tissue in neurodegenerative disease is a creative extension of the idea, with the goal of enabling hypothesis generation rather than determining lineage order.

    Transcriptomic data from postmortem brain samples is notoriously difficult to decipher. One never knows if the changes associated with the disease state can be causally linked to pathogenesis and progression, or are an end-stage response to organ damage. Pseudotime trajectories offer a computational approach to model this ordering, which can serve as a starting point for more detailed studies. For example, we can use experimental systems to determine if transcript changes mapped to early pseudotime states drive later changes.

  2. We are pleased that Alzforum included our work on the multi-omics characterization of brain-based pseudotime estimates. In our analysis, we associated bulk RNA-Seq-based pseudotime estimates (Mukherjee et al., 2020) with measured brain-metabolite levels, finding that a majority of the significantly associated metabolites also correlate with cognitive and neuropathological features (Batra et al., 2022). We then extended these findings with additional molecular associations using the AD Atlas to gain further insights into potentially disease-driving pathways.

    The temporal disease trajectory inferred from bulk RNA-Seq data seems to recapitulate neuropathological changes well and provides insight into early disease transcriptional changes. As bulk data measures the average gene expression across a variety of different cells, complexities within and across different cell types are potentially missed. snRNA-Seq enables the study of transcriptional changes at single-cell resolution, giving more detailed insights into cell population heterogeneity (Bakken et al., 2018), involvement of distinct cell populations (Keren-Shaul et al., 2017), and cell-type specific transcriptional alterations (Mathys et al., 2019) in disease. Therefore, it will be interesting to see if the temporal structure seen in bulk RNA-Seq data can also be observed at the cell-type-specific level, and if distinct cell populations can be associated with disease trajectory or resilience.

    Another interesting direction may be to include additional, potentially complementary, data modalities directly, such as proteomics, epigenomics or metabolomics data, to derive multi-omics-informed disease trajectories. Unfortunately, multi-omics integration is generally complicated by the heterogenous nature of different data modalities, regardless of whether it is performed on bulk or single-cell data (Argelaguet et al., 2021; Wörheide et al., 2021). This includes, for example, technical differences between arrays, different statistical properties, and large variations in the numbers of measured entities.

    Consequently, although it is technically possible to feed these algorithms with additional data, it becomes questionable whether they capture disease-relevant biological information or mainly technical variance. Furthermore, as mentioned by Dr. Laura Heath, any addition of data (multiple cells per sample and/or additional omics modalities) further increases the dimensionality and complexity of the data, posing additional statistical and computational challenges. Therefore, careful consideration of how to account for these differences will be crucial.

    In conclusion, manifold learning algorithms applied to cross-sectional data have shown potential to provide novel insights into Alzheimer’s disease. Although their robustness and scalability will require further investigation, the concept of data-derived pseudotime is intriguing.

    References:

    . Computational principles and challenges in single-cell data integration. Nat Biotechnol. 2021 Oct;39(10):1202-1215. Epub 2021 May 3 PubMed.

    . Single-nucleus and single-cell transcriptomes compared in matched cortical cell types. PLoS One. 2018;13(12):e0209648. Epub 2018 Dec 26 PubMed.

    . The landscape of metabolic brain alterations in Alzheimer's disease. Alzheimers Dement. 2022 Jul 13; PubMed.

    . A Unique Microglia Type Associated with Restricting Development of Alzheimer's Disease. Cell. 2017 Jun 15;169(7):1276-1290.e17. Epub 2017 Jun 8 PubMed.

    . Single-cell transcriptomic analysis of Alzheimer's disease. Nature. 2019 Jun;570(7761):332-337. Epub 2019 May 1 PubMed.

    . Molecular estimation of neurodegeneration pseudotime in older brains. Nat Commun. 2020 Nov 13;11(1):5781. PubMed. Correction.

    . Multi-omics integration in biomedical research - A metabolomics-centric review. Anal Chim Acta. 2021 Jan 2;1141:144-162. Epub 2020 Oct 22 PubMed.

Make a Comment

To make a comment you must login or register.

References

News Citations

  1. When It Comes to Alzheimer’s Disease, Do Human Microglia Even Give a DAM?
  2. RNA-Seq from 2.8 Million Cells Yields New Clues About Alzheimer's

Paper Citations

  1. . Molecular estimation of neurodegeneration pseudotime in older brains. Nat Commun. 2020 Nov 13;11(1):5781. PubMed. Correction.
  2. . An Integrated Molecular Atlas of Alzheimer’s Disease. medRxiv 2021.09.14.21263565 medRxiv
  3. . The landscape of metabolic brain alterations in Alzheimer's disease. Alzheimers Dement. 2022 Jul 13; PubMed.

External Citations

  1. Massachusetts Alzheimer’s Disease Research Center
  2. AD Atlas

Further Reading

No Available Further Reading