Brief Bio

Dr. Guergana Savova is the Patricia F. Brennan Professor at Harvard Medical School and Computational Health Informatics Program (CHIP; chip.org) at Boston Children’s Hospital. Her research interests are in natural language processing (NLP) and information extraction especially as applied to the text generated by physicians (the clinical narrative).  Dr. Savova has been creating gold standard annotated resources based on computable definitions and developing methods for computable solutions. The focus of Dr. Savova's research is higher level semantic and discourse processing of the clinical narrative which includes tasks such as named entity recognition, event recognition, relation detection and classification including coreference and temporal relations (thyme.healthnlp.org; share.healthnlp.org; cancer.healthnlp.org). The methods are mostly machine learning spanning supervised, lightly supervised and completely unsupervised.

The result of Dr. Savova's research with her collaborators has led to the creation of the clinical Text Analysis and Knowledge Extraction System (cTAKES; ctakes.apache.org). cTAKES is an information extraction system  comprising of a number of NLP components. As would be expected of any biomedical NLP tool, cTAKES can supply commonly extracted biomedical concepts such as symptoms, procedures, diagnoses, medications and anatomy with attributes and standard codes. However, setting it apart from other available biomedical NLP systems that focus on a specific NLP task and domain and are difficult to extend, cTAKES has been engineered in a modular fashion employing the latest machine learning probabilistic methods. These latest and leading edge methods from research investigations have directly been implemented as components in cTAKES. These components can, for instance, identify such things as complex relations between entities (e.g. the location of a tumor). cTAKES can also perform the extremely important task of identifying temporal events, dates and times – resulting in the absolute and relative placement of events in a patient timeline. It is the only biomedical open source NLP system using components with rule-based and supervised methods trained on gold standards from the general as well as the biomedical domain thus affording usability across different types of clinical narrative (e.g. pathology, radiology, clinical notes, etc.) from different institutions as well as other health related narrative (e.g. twitter feeds).

cTAKES has been applied to a number of biomedical use cases to mine the data within the clinical narrative such as i2b2, SHARPn, PGRN, eMERGE, PCORI. Within the Integrating Informatics and Biology to the Bedside (i2b2), cTAKES has been used to extract patient characteristics for determining their status related to a specific phenotype (Multiple Scleroris, Inflamatory Bowel Disease, Type 2 Diabetes). Within the Pharmacogenomics Research Network (PGRN), cTAKES has been applied to automatically determine patient's disease activity and detect responders versus non-responders to a specific treatment. Within the Electronic Medical Record and Genomics (eMERGE), cTAKES has been applied to automatically discover patients with Peripheral Arterial Disease, Autism Spectrum Disorder, Appendicitis, Early Childhood Obesity. Within the Patient-Powered Research Network, cTAKES has been applied to create a comprehensive phenotype picture for patients with one very rare disease – Phelan-McDermid Syndrome. cTAKES-extracted data can be embedded in the i2b2 platform as well as PheWAS/GWAS platforms such as tranSMART, thus combining it with genotypic data for even bigger data analysis.

Since 2014, Dr. Savova and her team have been developing DeepPhe -- a NLP System for Extracting Cancer Phenotypes from Clinical Records (https://deepphe.github.io/).  DeepPhe tools combine advanced NLP methods, summarization, data models, and visual analytics tools to help researchers easily understand complex cancer cases and cohorts. An extension of DeepPhe -- DeepPhe*CR (DeepPhe for Cancer Registries) -- addresses the need of the Cancer Registries to abstract cancer cases from supporting documentation. DeepPhe and DeepPhe*CR use core modules from cTAKES.

Dr. Guergana Savova has been on the editorial board of the Journal of the Medical Informatics Association (JAMIA), and a reviewer for several journals including Journal of the Biomedical Informatics (JBI), Journal of Language Resources and Evaluation (LREC), and many conferences/workshops. She has been a member of the National Library of Medicine's Biomedical Library and Informatics Review Committee and many other panels at the NIH

Dr. Guergana Savova holds a PhD in Linguistics with a minor in Cognitive Science and a Master’s of Science in Computer Science from University of Minnesota. Before joining Boston Children’s Hospital and Harvard Medical School in 2010, Dr. Savova was faculty at the Biomedical Statistics and Informatics Department, Mayo Clinic (2002-2010).

Full pubication list: https://scholar.google.com/citations?user=9538Cr4AAAAJ&hl=en&oi=ao