Publications by Year: 2011

2011

Paik, Hyojung, Eunjung Lee, Inho Park, Junho Kim, and Doheon Lee. (2011) 2011. “Prediction of Cancer Prognosis With the Genetic Basis of Transcriptional Variations.”. Genomics 97 (6): 350-7. https://doi.org/10.1016/j.ygeno.2011.03.005.

Phenotypes of diseases, including prognosis, are likely to have complex etiologies and be derived from interactive mechanisms, including genetic and protein interactions. Many computational methods have been used to predict survival outcomes without explicitly identifying interactive effects, such as the genetic basis for transcriptional variations. We have therefore proposed a classification method based on the interaction between genotype and transcriptional expression features (CORE-F). This method considers the overall "genetic architecture," referring to genetically based transcriptional alterations that influence prognosis. In comparing the performance of CORE-F with the ensemble tree, the best-performing method predicting patient survival, we found that CORE-F outperformed the ensemble tree (mean AUC, 0.85 vs. 0.72). Moreover, the trained associations in the CORE-F successfully identified the genetic mechanisms underlying survival outcomes at the interaction-network level.

Lee, Sejoon, Eunjung Lee, Kwang H Lee, and Doheon Lee. (2011) 2011. “Predicting Disease Phenotypes Based on the Molecular Networks With Condition-Responsive Correlation.”. International Journal of Data Mining and Bioinformatics 5 (2): 131-42.

Network-based methods using molecular interaction networks integrated with gene expression profiles have been proposed to solve problems, which arose from smaller number of samples compared with the large number of predictors. However, previous network-based methods, which have focused only on expression levels of proteins, nodes in the network through the identification of condition-responsive interactions. We propose a novel network-based classification, which focuses on both nodes with discriminative expression levels and edges with Condition-Responsive Correlations (CRCs) across two phenotypes. We found that modules with condition-responsive interactions provide candidate molecular models for diseases and show improved performances compared conventional gene-centric classification methods.

Xi, Ruibin, Angela G Hadjipanayis, Lovelace J Luquette, Tae-Min Kim, Eunjung Lee, Jianhua Zhang, Mark D Johnson, et al. (2011) 2011. “Copy Number Variation Detection in Whole-Genome Sequencing Data Using the Bayesian Information Criterion.”. Proceedings of the National Academy of Sciences of the United States of America 108 (46): E1128-36. https://doi.org/10.1073/pnas.1110574108.

DNA copy number variations (CNVs) play an important role in the pathogenesis and progression of cancer and confer susceptibility to a variety of human disorders. Array comparative genomic hybridization has been used widely to identify CNVs genome wide, but the next-generation sequencing technology provides an opportunity to characterize CNVs genome wide with unprecedented resolution. In this study, we developed an algorithm to detect CNVs from whole-genome sequencing data and applied it to a newly sequenced glioblastoma genome with a matched control. This read-depth algorithm, called BIC-seq, can accurately and efficiently identify CNVs via minimizing the Bayesian information criterion. Using BIC-seq, we identified hundreds of CNVs as small as 40 bp in the cancer genome sequenced at 10× coverage, whereas we could only detect large CNVs (> 15 kb) in the array comparative genomic hybridization profiles for the same genome. Eighty percent (14/16) of the small variants tested (110 bp to 14 kb) were experimentally validated by quantitative PCR, demonstrating high sensitivity and true positive rate of the algorithm. We also extended the algorithm to detect recurrent CNVs in multiple samples as well as deriving error bars for breakpoints using a Gibbs sampling approach. We propose this statistical approach as a principled yet practical and efficient method to estimate CNVs in whole-genome sequencing data.