Heterogeneity-preserving discriminative feature selection for disease-specific subtype discovery.

Basher, Abdur Rahman M A, Caleb Hallinan, and Kwonmoo Lee. 2025. “Heterogeneity-Preserving Discriminative Feature Selection for Disease-Specific Subtype Discovery.”. Nature Communications 16 (1): 3593.

Abstract

Disease-specific subtype identification can deepen our understanding of disease progression and pave the way for personalized therapies, given the complexity of disease heterogeneity. Large-scale transcriptomic, proteomic, and imaging datasets create opportunities for discovering subtypes but also pose challenges due to their high dimensionality. To mitigate this, many feature selection methods focus on selecting features that distinguish known diseases or cell states, yet often miss features that preserve heterogeneity and reveal new subtypes. To overcome this gap, we develop Preserving Heterogeneity (PHet), a statistical methodology that employs iterative subsampling and differential analysis of interquartile range, in conjunction with Fisher's method, to identify a small set of features that enhance subtype clustering quality. Here, we show that this method can maintain sample heterogeneity while distinguishing known disease/cell states, with a tendency to outperform previous differential expression and outlier-based methods, indicating its potential to advance our understanding of disease mechanisms and cell differentiation.

Last updated on 04/17/2025
PubMed