Publications

Submitted

Wang, Seunghyun, Mingyun Bae, Jinhao Wang, Boxun Zhao, Khue Nguyen, Shayna Mallett, Jessica A. Switzenberg, et al. Submitted. “Multi-Platform Framework for Mapping Somatic Retrotransposition in Human Tissues”. BioRxiv, Submitted.

Mobile element insertions (MEI) shape the human genome in both germline and somatic tissues. While inherited MEIs are well characterized, mapping somatic MEIs (sMEI) in non-cancer tissues remains challenging due to their low allelic fraction and repetitive nature. We established an integrative framework for sMEI analysis leveraging modern sequencing technologies and analytical innovations. We first benchmarked sMEI detection and demonstrated advantages of long-read and MEI-targeted sequencing for ultra-low-frequency events using a mixture of well-established cell lines. We then showed that haplotype phasing and donor-specific assemblies refine sMEI detection, effectively distinguishing from germline and false signals in in-silico tumor-normal mixtures. We further developed a source-tracing strategy based on internal sequence variation, expanding the catalogue of active source elements beyond traditional transduction-based methods. Applying this framework to donor tissues, we identified 18 rare somatic L1 insertions, revealing structural and source diversity. Our work provides a foundational framework and biological insight into sMEIs.

Voshall, Adam, Jeongjun Chae, Honglan Li, Junsu Ko, Woongyang Park, Eunjung Alice Lee, and Yoonjoo Choi. Submitted. “STRUMP-I: Structure-Based Machine Learning Approach to PMHC-I Binding Prediction Using Force Field Energy Features”. BioRxiv, Submitted.

The adaptive immune system monitors cellular integrity by recognizing short peptides from intracellular proteins presented on Major Histocompatibility Complex class I (MHC-I) molecules, collectively termed peptide-MHC complexes (pMHC), enabling detection of foreign or mutated proteins. With the rising importance of immunotherapies targeting neoantigens in cancers, the ability to accurately predict which peptides will bind to the diverse population of MHC alleles is critically important. Current computational methods for pMHC-I prediction fall broadly into sequence-based methods, which rely heavily on large training datasets, and structure-based methods that leverage structural modeling and energetics of pMHC binding. While sequence-based methods have been popularly used, their performance is dependent on the size and quality of training data. On the other hands, while structure-based approaches can generalize better across diverse MHC alleles, they traditionally depend on identifying a single global minimum energy conformation, an assumption that often fails due to the inherent binding promiscuity of MHC-I molecules. To address these limitations, we developed a STRUMP-I (STRUcture-based pMHC Prediction (for class I)), a novel pMHC binding prediction tool that directly leverages a broad set of force-field-derived energy terms as machine-learning features. STRUMP-I achieves performance comparable to state-of-the-art sequence-based models while significantly outperforming them on MHC alleles with limited representation in training data. Furthermore, STRUMP-I demonstrates strong synergy when integrated with sequence-based methods, notably enhancing prediction precision. The robustness and generalizability of STRUMP-I were confirmed by evaluating its predictive performance on independent, previously unseen datasets, including an experimentally validated cancer neoantigen dataset. This combined approach advances our capability to reliably identify clinically relevant neoantigen targets. The source code and trained models are available at https://github.com/yoonjoolab/STRUMP-I.

2025

Coorens, Tim H. H., Ji Won Oh, Yujin Angelina Choi, Nam Seop Lim, Boxun Zhao, Adam Voshall, Alexej Abyzov, et al. 2025. “The Somatic Mosaicism across Human Tissues Network”. Nature.

From fertilization onwards, the cells of the human body acquire variations in their DNA sequence, known as somatic mutations. These postzygotic mutations arise from intrinsic errors in DNA replication and repair, as well as from exposure to mutagens. Somatic mutations have been implicated in some diseases, but a fundamental understanding of the frequency, type and patterns of mutations across healthy human tissues has been limited. This is primarily due to the small proportion of cells harbouring specific somatic variants within an individual, making them more challenging to detect than inherited variants. Here we describe the Somatic Mosaicism across Human Tissues Network, which aims to create a reference catalogue of somatic mutations and their clonal patterns across 19 different tissue sites from 150 non-diseased donors and develop new technologies and computational tools to detect somatic mutations and assess their phenotypic consequences, including clonal expansions. This strategy enables a comprehensive examination of the mutational landscape across the human body, and provides a comparison baseline for somatic mutation in diseases. This will lead to a deep understanding of somatic mutations and clonal expansions across the lifespan, as well as their roles in health, in ageing and, by comparison, in diseases.

Aron, Liviu, Zhen Kai Ngian, Chenxi Qiu, Jaejoon Choi, Marianna Liang, Derek M. Drake, Sara E. Hamplova, et al. 2025. “Lithium Deficiency and the Onset of Alzheimer’s Disease”. Nature.

 

The earliest molecular changes in Alzheimer’s disease (AD) are poorly understood. Here we show that endogenous lithium (Li) is dynamically regulated in the brain and contributes to cognitive preservation during ageing. Of the metals we analysed, Li was the only one that was significantly reduced in the brain in individuals with mild cognitive impairment (MCI), a precursor to AD. Li bioavailability was further reduced in AD by amyloid sequestration. We explored the role of endogenous Li in the brain by depleting it from the diet of wild-type and AD mouse models. Reducing endogenous cortical Li by approximately 50% markedly increased the deposition of amyloid-β and the accumulation of phospho-tau, and led to pro-inflammatory microglial activation, the loss of synapses, axons and myelin, and accelerated cognitive decline. These effects were mediated, at least in part, through activation of the kinase GSK3β. Single-nucleus RNA-seq showed that Li deficiency gives rise to transcriptome changes in multiple brain cell types that overlap with transcriptome changes in AD. Replacement therapy with lithium orotate, which is a Li salt with reduced amyloid binding, prevents pathological changes and memory loss in AD mouse models and ageing wild-type mice. These findings reveal physiological effects of endogenous Li in the brain and indicate that disruption of Li homeostasis may be an early event in the pathogenesis of AD. Li replacement with amyloid-evading salts is a potential approach to the prevention and treatment of AD.

 

 

Choi, Jaejoon, Kyung Sun Park, Yann Le Guen, Jong-Ho Park, Zinan Zhou, Liz Enyenihi, Ila Rosen, et al. 2025. “Clonal Hematopoiesis Mutations Increase Risk of Alzheimer’s Disease With APOE ϵ3/ϵ3 Genotype”. BioRxiv, Accepted.

Clonal hematopoiesis of indeterminate potential (CHIP) represents clonal expansion of blood cells, and increases the risk of hematological malignancies and cardiovascular disorders. Recent studies have studied CHIP mutations in individuals with Alzheimer's disease (AD), but it is unclear whether their role in AD pathogenesis is protective, detrimental, or neutral. In this study, we used molecular-barcoded deep gene panel sequencing (~400X) to examine CHIP mutations in 298 blood samples from AD and neurotypical individuals 60 years and older. The AD patients exhibited a significantly higher burden of CHIP mutations compared to the age-matched controls (p < 2e-7, odds ratio (OR) = 2.89), particularly in low-frequency variants often not captured by standard whole exome or whole genome sequencing (WGS). This increase was driven by individuals with the APOE ϵ3/ϵ3 genotype and absent in ϵ4 carriers. Analysis of an independent dataset from the Alzheimer's Disease Sequencing Project (ADSP), comprised of WGS data from ~30,000 individuals, confirmed increased CHIP mutations in AD versus control (p < 0.02, OR = 1.32), again driven by individuals with APOE ϵ3/ϵ3 genotype. CHIP mutations in AD patients also showed stronger positive selection than in controls. Our results indicate that AD patients show significantly more CHIP mutations in their blood than controls, involving more than one third of AD patients, and contributing to AD risk through a mechanism independent of APOE ϵ4.

Denisko, Danielle, Jeonghyeon Kim, Jayoung Ku, Boxun Zhao, and Eunjung Alice Lee. 2025. “Inverted Alu Repeats in Loop-Out Exon Skipping across Hominoid Evolution”. BioRxiv.

Background Changes in RNA splicing over the course of evolution have profoundly diversified the functional landscape of the human genome. While DNA sequences proximal to intron-exon junctions are known to be critical for RNA splicing, the impact of distal intronic sequences remains underexplored. Emerging evidence suggests that inverted pairs of intronic Alu elements can promote exon skipping by forming RNA stem-loop structures. However, their prevalence and influence throughout evolution remain unknown.

Results Here, we present a systematic analysis of inverted Alu pairs across the human genome to assess their impact on exon skipping through predicted RNA stem-loop formation and their relevance to hominoid evolution. We found that inverted Alu pairs, particularly pairs of AluY-AluSx1 and AluSz-AluSx, are enriched in the flanking regions of skippable exons genome-wide and are predicted to form stable stem-loop structures. Exons defined by weak 3′ acceptor and strong 5′ donor splice sites appear especially prone to this skipping mechanism. Through comparative genome analysis across nine primate species, we identified 67,126 hominoid-specific Alu insertions, primarily from AluY and AluS subfamilies, which form inverted pairs enriched across skippable exons in genes of ubiquitination-related pathways. Experimental validation of exon skipping among several hominoid-specific inverted Alu pairs further reinforced their potential evolutionary significance.

Conclusion This work extends our current knowledge of the roles of RNA secondary structure formed by inverted Alu pairs and details a newly emerging mechanism through which transposable elements have contributed to genomic innovation across hominoid evolution at the transcriptomic level.

Gunter-Rahman, Fatima, Shayna Mallett, Frédérique White, Pierre-Étienne Jacques, Ravikiran M Raju, Marie-France Hivert, and Eunjung Alice Lee. 2025. “Hypoxia in Extravillous Trophoblasts Links Maternal Obesity and Offspring Neurobehavior”. IScience, Accepted.

One third of women in the United States are affected by obesity during pregnancy. Maternal obesity (MO) is associated with an increased risk of neurodevelopmental and metabolic disorders in the offspring. The placenta, located at the maternal-fetal interface, is a key organ determining fetal development and likely contributes to programming of long-term offspring health. We profiled the term placental transcriptome in humans (pre-pregnancy BMI 35+ [MO condition] or 18.5-25 [lean condition]) using single-nucleus RNA-seq to compare expression profiles in MO versus lean conditions, and to reveal potential mechanisms underlying offspring disease risk. We recovered 62,864 nuclei of high quality from 10 samples each from the maternal-facing and fetal-facing sides of the placenta. On both sides in several cell types, MO was associated with upregulation of hypoxia response genes. On the maternal-facing side only, hypoxia gene expression was associated with offspring neurodevelopmental measures, in Gen3G, an independent pregnancy cohort with bulk placental tissue RNA-seq. We leveraged Gen3G to determine genes that correlated with impaired neurodevelopment and found these genes to be most highly expressed in extravillous trophoblasts (EVTs). EVTs further showed the strongest correlation between neurodevelopment impairment gene scores (NDIGSs) and the hypoxia gene score. We reanalyzed gene expression of cultured EVTs, and found increased NDIGSs associated with exposure to hypoxia. Among EVTs, accounting for the hypoxia gene score attenuated 44% of the association between BMI and NDIGSs. These data suggest that hypoxia in EVTs may be a key process in the neurodevelopmental programming of fetal exposure to MO.

Zhou, Zinan, Lovelace J Luquette, Guanlan Dong, Junho Kim, Jayoung Ku, Kisong Kim, Mingyun Bae, et al. 2025. “Recurrent Patterns of Widespread Neuronal Genomic Damage Shared by Major Neurodegenerative Disorders”. BioRxiv.

Amyotrophic lateral sclerosis (ALS), frontotemporal dementia (FTD), and Alzheimer's disease (AD) are common neurodegenerative disorders for which the mechanisms driving neuronal death remain unclear. Single-cell whole-genome sequencing of 429 neurons from three C9ORF72 ALS, six C9ORF72 FTD, seven AD, and twenty-three neurotypical control brains revealed significantly increased burdens in somatic single nucleotide variant (sSNV) and insertion/deletion (sIndel) in all three disease conditions. Mutational signature analysis identified a disease-associated sSNV signature suggestive of oxidative damage and an sIndel process, affecting 28% of ALS, 79% of FTD, and 65% of AD neurons but only 5% of control neurons (diseased vs. control: OR=31.20, p=2.35X10-10). Disease-associated sIndels were primarily two-basepair deletions resembling signature ID4, which was previously linked to topoisomerase 1 (TOP1)-mediated mutagenesis. Duplex sequencing confirmed the presence of sIndels and identified similar single-strand events as potential precursor lesions. TOP1-associated sIndel mutagenesis and resulting genome instability may thus represent a common mechanism of neurodegeneration.

Dong, Guanlan, Chanthia C. Ma, Shulin Mao, Samuel M. Naik, Katherine Sun-Mi Brown, Gannon A. McDonough, Junho Kim, et al. 2025. “Diverse Somatic Genomic Alterations in Single Neurons in Chronic Traumatic Encephalopathy”. BioRxiv .

Chronic traumatic encephalopathy (CTE) is a neurodegenerative disease that is linked to exposure to repetitive head impacts (RHI), yet little is known about its pathogenesis. Applying two single-cell whole-genome sequencing methods to hundreds of neurons from prefrontal cortex of 15 individuals with CTE, and 4 with RHI without CTE, revealed increased somatic single-nucleotide variants in CTE, resembling a pattern previously reported in Alzheimer’s disease (AD). Furthermore, we discovered remarkably high burdens of somatic small insertions and deletions in a subset of CTE individuals, resembling a known pattern, ID4, also found in AD. Our results suggest that neurons in CTE experience stereotyped mutational processes shared with AD; the absence of similar changes in RHI neurons without CTE suggests that CTE involves mechanisms beyond RHI alone.