From fertilization onwards, the cells of the human body acquire variations in their DNA sequence, known as somatic mutations. These postzygotic mutations arise from intrinsic errors in DNA replication and repair, as well as from exposure to mutagens. Somatic mutations have been implicated in some diseases, but a fundamental understanding of the frequency, type and patterns of mutations across healthy human tissues has been limited. This is primarily due to the small proportion of cells harbouring specific somatic variants within an individual, making them more challenging to detect than inherited variants. Here we describe the Somatic Mosaicism across Human Tissues Network, which aims to create a reference catalogue of somatic mutations and their clonal patterns across 19 different tissue sites from 150 non-diseased donors and develop new technologies and computational tools to detect somatic mutations and assess their phenotypic consequences, including clonal expansions. This strategy enables a comprehensive examination of the mutational landscape across the human body, and provides a comparison baseline for somatic mutation in diseases. This will lead to a deep understanding of somatic mutations and clonal expansions across the lifespan, as well as their roles in health, in ageing and, by comparison, in diseases.
Publications
2025
The earliest molecular changes in Alzheimer’s disease (AD) are poorly understood. Here we show that endogenous lithium (Li) is dynamically regulated in the brain and contributes to cognitive preservation during ageing. Of the metals we analysed, Li was the only one that was significantly reduced in the brain in individuals with mild cognitive impairment (MCI), a precursor to AD. Li bioavailability was further reduced in AD by amyloid sequestration. We explored the role of endogenous Li in the brain by depleting it from the diet of wild-type and AD mouse models. Reducing endogenous cortical Li by approximately 50% markedly increased the deposition of amyloid-β and the accumulation of phospho-tau, and led to pro-inflammatory microglial activation, the loss of synapses, axons and myelin, and accelerated cognitive decline. These effects were mediated, at least in part, through activation of the kinase GSK3β. Single-nucleus RNA-seq showed that Li deficiency gives rise to transcriptome changes in multiple brain cell types that overlap with transcriptome changes in AD. Replacement therapy with lithium orotate, which is a Li salt with reduced amyloid binding, prevents pathological changes and memory loss in AD mouse models and ageing wild-type mice. These findings reveal physiological effects of endogenous Li in the brain and indicate that disruption of Li homeostasis may be an early event in the pathogenesis of AD. Li replacement with amyloid-evading salts is a potential approach to the prevention and treatment of AD.
Clonal hematopoiesis of indeterminate potential (CHIP) represents clonal expansion of blood cells, and increases the risk of hematological malignancies and cardiovascular disorders. Recent studies have studied CHIP mutations in individuals with Alzheimer's disease (AD), but it is unclear whether their role in AD pathogenesis is protective, detrimental, or neutral. In this study, we used molecular-barcoded deep gene panel sequencing (~400X) to examine CHIP mutations in 298 blood samples from AD and neurotypical individuals 60 years and older. The AD patients exhibited a significantly higher burden of CHIP mutations compared to the age-matched controls (p < 2e-7, odds ratio (OR) = 2.89), particularly in low-frequency variants often not captured by standard whole exome or whole genome sequencing (WGS). This increase was driven by individuals with the APOE ϵ3/ϵ3 genotype and absent in ϵ4 carriers. Analysis of an independent dataset from the Alzheimer's Disease Sequencing Project (ADSP), comprised of WGS data from ~30,000 individuals, confirmed increased CHIP mutations in AD versus control (p < 0.02, OR = 1.32), again driven by individuals with APOE ϵ3/ϵ3 genotype. CHIP mutations in AD patients also showed stronger positive selection than in controls. Our results indicate that AD patients show significantly more CHIP mutations in their blood than controls, involving more than one third of AD patients, and contributing to AD risk through a mechanism independent of APOE ϵ4.
Background Changes in RNA splicing over the course of evolution have profoundly diversified the functional landscape of the human genome. While DNA sequences proximal to intron-exon junctions are known to be critical for RNA splicing, the impact of distal intronic sequences remains underexplored. Emerging evidence suggests that inverted pairs of intronic Alu elements can promote exon skipping by forming RNA stem-loop structures. However, their prevalence and influence throughout evolution remain unknown.
Results Here, we present a systematic analysis of inverted Alu pairs across the human genome to assess their impact on exon skipping through predicted RNA stem-loop formation and their relevance to hominoid evolution. We found that inverted Alu pairs, particularly pairs of AluY-AluSx1 and AluSz-AluSx, are enriched in the flanking regions of skippable exons genome-wide and are predicted to form stable stem-loop structures. Exons defined by weak 3′ acceptor and strong 5′ donor splice sites appear especially prone to this skipping mechanism. Through comparative genome analysis across nine primate species, we identified 67,126 hominoid-specific Alu insertions, primarily from AluY and AluS subfamilies, which form inverted pairs enriched across skippable exons in genes of ubiquitination-related pathways. Experimental validation of exon skipping among several hominoid-specific inverted Alu pairs further reinforced their potential evolutionary significance.
Conclusion This work extends our current knowledge of the roles of RNA secondary structure formed by inverted Alu pairs and details a newly emerging mechanism through which transposable elements have contributed to genomic innovation across hominoid evolution at the transcriptomic level.
One third of women in the United States are affected by obesity during pregnancy. Maternal obesity (MO) is associated with an increased risk of neurodevelopmental and metabolic disorders in the offspring. The placenta, located at the maternal-fetal interface, is a key organ determining fetal development and likely contributes to programming of long-term offspring health. We profiled the term placental transcriptome in humans (pre-pregnancy BMI 35+ [MO condition] or 18.5-25 [lean condition]) using single-nucleus RNA-seq to compare expression profiles in MO versus lean conditions, and to reveal potential mechanisms underlying offspring disease risk. We recovered 62,864 nuclei of high quality from 10 samples each from the maternal-facing and fetal-facing sides of the placenta. On both sides in several cell types, MO was associated with upregulation of hypoxia response genes. On the maternal-facing side only, hypoxia gene expression was associated with offspring neurodevelopmental measures, in Gen3G, an independent pregnancy cohort with bulk placental tissue RNA-seq. We leveraged Gen3G to determine genes that correlated with impaired neurodevelopment and found these genes to be most highly expressed in extravillous trophoblasts (EVTs). EVTs further showed the strongest correlation between neurodevelopment impairment gene scores (NDIGSs) and the hypoxia gene score. We reanalyzed gene expression of cultured EVTs, and found increased NDIGSs associated with exposure to hypoxia. Among EVTs, accounting for the hypoxia gene score attenuated 44% of the association between BMI and NDIGSs. These data suggest that hypoxia in EVTs may be a key process in the neurodevelopmental programming of fetal exposure to MO.
Amyotrophic lateral sclerosis (ALS), frontotemporal dementia (FTD), and Alzheimer's disease (AD) are common neurodegenerative disorders for which the mechanisms driving neuronal death remain unclear. Single-cell whole-genome sequencing of 429 neurons from three C9ORF72 ALS, six C9ORF72 FTD, seven AD, and twenty-three neurotypical control brains revealed significantly increased burdens in somatic single nucleotide variant (sSNV) and insertion/deletion (sIndel) in all three disease conditions. Mutational signature analysis identified a disease-associated sSNV signature suggestive of oxidative damage and an sIndel process, affecting 28% of ALS, 79% of FTD, and 65% of AD neurons but only 5% of control neurons (diseased vs. control: OR=31.20, p=2.35X10-10). Disease-associated sIndels were primarily two-basepair deletions resembling signature ID4, which was previously linked to topoisomerase 1 (TOP1)-mediated mutagenesis. Duplex sequencing confirmed the presence of sIndels and identified similar single-strand events as potential precursor lesions. TOP1-associated sIndel mutagenesis and resulting genome instability may thus represent a common mechanism of neurodegeneration.
Chronic traumatic encephalopathy (CTE) is a neurodegenerative disease that is linked to exposure to repetitive head impacts (RHI), yet little is known about its pathogenesis. Applying two single-cell whole-genome sequencing methods to hundreds of neurons from prefrontal cortex of 15 individuals with CTE, and 4 with RHI without CTE, revealed increased somatic single-nucleotide variants in CTE, resembling a pattern previously reported in Alzheimer’s disease (AD). Furthermore, we discovered remarkably high burdens of somatic small insertions and deletions in a subset of CTE individuals, resembling a known pattern, ID4, also found in AD. Our results suggest that neurons in CTE experience stereotyped mutational processes shared with AD; the absence of similar changes in RHI neurons without CTE suggests that CTE involves mechanisms beyond RHI alone.
2024
LINE-1 (L1) retrotransposition is widespread in many cancers, especially those with a high burden of chromosomal rearrangements. However, whether and to what degree L1 activity directly impacts genome integrity is unclear. Here, we apply whole-genome sequencing to experimental models of L1 expression to comprehensively define the spectrum of genomic changes caused by L1. We provide definitive evidence that L1 expression frequently and directly causes both local and long-range chromosomal rearrangements, small and large segmental copy-number alterations, and subclonal copy-number heterogeneity due to ongoing chromosomal instability. Mechanistically, all these alterations arise from DNA double-strand breaks (DSBs) generated by L1-encoded ORF2p. The processing of ORF2p-generated DSB ends prior to their ligation can produce diverse rearrangements of the target sequences. Ligation between DSB ends generated at distal loci can generate either stable chromosomes or unstable dicentric, acentric, or ring chromosomes that undergo subsequent evolution through breakage-fusion bridge cycles or DNA fragmentation. Together, these findings suggest L1 is a potent mutagenic force capable of driving genome evolution beyond simple insertions.
Somatic mosaic variants contribute to focal epilepsy, but genetic analysis has been limited to patients with drug-resistant epilepsy (DRE) who undergo surgical resection, as the variants are mainly brain-limited. Stereoelectroencephalography (sEEG) has become part of the evaluation for many patients with focal DRE, and sEEG electrodes provide a potential source of small amounts of brain-derived DNA. We aimed to identify, validate, and assess the distribution of potentially clinically relevant mosaic variants in DNA extracted from trace brain tissue on individual sEEG electrodes.
We enrolled a prospective cohort of eleven pediatric patients with DRE who had sEEG electrodes implanted for invasive monitoring, one of whom was previously reported. We extracted unamplified DNA from the trace brain tissue on each sEEG electrode and also performed whole-genome amplification for each sample. We extracted DNA from resected brain tissue and blood/saliva samples where available. We performed deep panel and exome sequencing on a subset of samples from each case and analysis for potentially clinically relevant candidate germline and mosaic variants. We validated candidate mosaic variants using amplicon sequencing and assessed the variant allele fraction (VAF) in amplified and unamplified electrode-derived DNA and across electrodes.
We extracted DNA from >150 individual electrodes from 11 individuals and obtained higher concentrations of whole-genome amplified vs unamplified DNA. Immunohistochemistry confirmed the presence of neurons in the brain tissue on electrodes. Deep sequencing and analysis demonstrated similar depth of coverage between amplified and unamplified samples but significantly more called mosaic variants in amplified samples. In addition to the mosaic PIK3CA variant detected in a previously reported case from our group, we identified and validated four potentially clinically relevant mosaic variants in electrode-derived DNA in three patients who underwent laser ablation and did not have resected brain tissue samples available. The variants were detected in both amplified and unamplified electrode-derived DNA, with higher VAFs observed in DNA from electrodes in closest proximity to the electrical seizure focus in some cases.
This study demonstrates that mosaic variants can be identified and validated from DNA extracted from trace brain tissue on individual sEEG electrodes in patients with drug-resistant focal epilepsy and in both amplified and unamplified electrode-derived DNA samples. Our findings support a relationship between the extent of regional genetic abnormality and electrophysiology, and suggest that with further optimization, this minimally invasive diagnostic approach holds promise for advancing precision medicine for patients with DRE as part of the surgical evaluation.