Publications

E

Reis B, Kohane I, Mandl K. An epidemiological network model for disease outbreak detection.. PLoS Med. 2007;4(6):e210. doi:10.1371/journal.pmed.0040210
BACKGROUND: Advanced disease-surveillance systems have been deployed worldwide to provide early detection of infectious disease outbreaks and bioterrorist attacks. New methods that improve the overall detection capabilities of these systems can have a broad practical impact. Furthermore, most current generation surveillance systems are vulnerable to dramatic and unpredictable shifts in the health-care data that they monitor. These shifts can occur during major public events, such as the Olympics, as a result of population surges and public closures. Shifts can also occur during epidemics and pandemics as a result of quarantines, the worried-well flooding emergency departments or, conversely, the public staying away from hospitals for fear of nosocomial infection. Most surveillance systems are not robust to such shifts in health-care utilization, either because they do not adjust baselines and alert-thresholds to new utilization levels, or because the utilization shifts themselves may trigger an alarm. As a result, public-health crises and major public events threaten to undermine health-surveillance systems at the very times they are needed most. METHODS AND FINDINGS: To address this challenge, we introduce a class of epidemiological network models that monitor the relationships among different health-care data streams instead of monitoring the data streams themselves. By extracting the extra information present in the relationships between the data streams, these models have the potential to improve the detection capabilities of a system. Furthermore, the models' relational nature has the potential to increase a system's robustness to unpredictable baseline shifts. We implemented these models and evaluated their effectiveness using historical emergency department data from five hospitals in a single metropolitan area, recorded over a period of 4.5 y by the Automated Epidemiological Geotemporal Integrated Surveillance real-time public health-surveillance system, developed by the Children's Hospital Informatics Program at the Harvard-MIT Division of Health Sciences and Technology on behalf of the Massachusetts Department of Public Health. We performed experiments with semi-synthetic outbreaks of different magnitudes and simulated baseline shifts of different types and magnitudes. The results show that the network models provide better detection of localized outbreaks, and greater robustness to unpredictable shifts than a reference time-series modeling approach. CONCLUSIONS: The integrated network models of epidemiological data streams and their interrelationships have the potential to improve current surveillance efforts, providing better localized outbreak detection under normal circumstances, as well as more robust performance in the face of shifts in health-care utilization during epidemics and major public events.

C

Cami A, Reis B. Concordance and predictive value of two adverse drug event data sets.. BMC Med Inform Decis Mak. 2014;14:74. doi:10.1186/1472-6947-14-74
BACKGROUND: Accurate prediction of adverse drug events (ADEs) is an important means of controlling and reducing drug-related morbidity and mortality. Since no single "gold standard" ADE data set exists, a range of different drug safety data sets are currently used for developing ADE prediction models. There is a critical need to assess the degree of concordance between these various ADE data sets and to validate ADE prediction models against multiple reference standards. METHODS: We systematically evaluated the concordance of two widely used ADE data sets - Lexi-comp from 2010 and SIDER from 2012. The strength of the association between ADE (drug) counts in Lexi-comp and SIDER was assessed using Spearman rank correlation, while the differences between the two data sets were characterized in terms of drug categories, ADE categories and ADE frequencies. We also performed a comparative validation of the Predictive Pharmacosafety Networks (PPN) model using both ADE data sets. The predictive power of PPN using each of the two validation sets was assessed using the area under Receiver Operating Characteristic curve (AUROC). RESULTS: The correlations between the counts of ADEs and drugs in the two data sets were 0.84 (95% CI: 0.82-0.86) and 0.92 (95% CI: 0.91-0.93), respectively. Relative to an earlier snapshot of Lexi-comp from 2005, Lexi-comp 2010 and SIDER 2012 introduced a mean of 1,973 and 4,810 new drug-ADE associations per year, respectively. The difference between these two data sets was most pronounced for Nervous System and Anti-infective drugs, Gastrointestinal and Nervous System ADEs, and postmarketing ADEs. A minor difference of 1.1% was found in the AUROC of PPN when SIDER 2012 was used for validation instead of Lexi-comp 2010. CONCLUSIONS: In conclusion, the ADE and drug counts in Lexi-comp and SIDER data sets were highly correlated and the choice of validation set did not greatly affect the overall prediction performance of PPN. Our results also suggest that it is important to be aware of the differences that exist among ADE data sets, especially in modeling applications focused on specific drug and ADE categories.
Butte, Bao, Reis, Watkins, Kohane. Comparing the similarity of time-series gene expression using signal processing metrics.. J Biomed Inform. 2001;34(6):396–405. doi:10.1006/jbin.2002.1037
Many algorithms have been used to cluster genes measured by microarray across a time series. Instead of clustering, our goal was to compare all pairs of genes to determine whether there was evidence of a phase shift between them. We describe a technique where gene expression is treated as a discrete time-invariant signal, allowing the use of digital signal-processing tools, including power spectral density, coherence, and transfer gain and phase shift. We used these on a public RNA expression set of 2467 genes measured every 7 min for 119 min and found 18 putative associations. Two of these were known in the biomedical literature and may have been missed using correlation coefficients. Digital signal processing tools can be embedded and enhance existing clustering algorithms.

A

According to a popular hypothesis, short-term memories are stored as persistent neural activity maintained by synaptic feedback loops. This hypothesis has been formulated mathematically in a number of recurrent network models. Here we study an abstraction of these models, a single neuron with a synapse onto itself, or autapse. This abstraction cannot simulate the way in which persistent activity patterns are distributed over neural populations in the brain. However, with proper tuning of parameters, it does reproduce the continuously graded, or analog, nature of many examples of persistent activity. The conditions for tuning are derived for the dynamics of a conductance-based model neuron with a slow excitatory autapse. The derivation uses the method of averaging to approximate the spiking model with a nonspiking, reduced model. Short-term analog memory storage is possible if the reduced model is approximately linear and if its feedforward bias and autapse strength are precisely tuned.
Wang J-F, Reis B, Hu M-G, Christakos G, Yang W-Z, Sun Q, Li Z-J, Li X-Z, Lai S-J, Chen H-Y, et al. Area disease estimation based on sentinel hospital records.. PLoS One. 2011;6(8):e23428. doi:10.1371/journal.pone.0023428
BACKGROUND: Population health attributes (such as disease incidence and prevalence) are often estimated using sentinel hospital records, which are subject to multiple sources of uncertainty. When applied to these health attributes, commonly used biased estimation techniques can lead to false conclusions and ineffective disease intervention and control. Although some estimators can account for measurement error (in the form of white noise, usually after de-trending), most mainstream health statistics techniques cannot generate unbiased and minimum error variance estimates when the available data are biased. METHODS AND FINDINGS: A new technique, called the Biased Sample Hospital-based Area Disease Estimation (B-SHADE), is introduced that generates space-time population disease estimates using biased hospital records. The effectiveness of the technique is empirically evaluated in terms of hospital records of disease incidence (for hand-foot-mouth disease and fever syndrome cases) in Shanghai (China) during a two-year period. The B-SHADE technique uses a weighted summation of sentinel hospital records to derive unbiased and minimum error variance estimates of area incidence. The calculation of these weights is the outcome of a process that combines: the available space-time information; a rigorous assessment of both, the horizontal relationships between hospital records and the vertical links between each hospital's records and the overall disease situation in the region. In this way, the representativeness of the sentinel hospital records was improved, the possible biases of these records were corrected, and the generated area incidence estimates were best linear unbiased estimates (BLUE). Using the same hospital records, the performance of the B-SHADE technique was compared against two mainstream estimators. CONCLUSIONS: The B-SHADE technique involves a hospital network-based model that blends the optimal estimation features of the Block Kriging method and the sample bias correction efficiency of the ratio estimator method. In this way, B-SHADE can overcome the limitations of both methods: Block Kriging's inadequacy concerning the correction of sample bias and spatial clustering; and the ratio estimator's limitation as regards error minimization. The generality of the B-SHADE technique is further demonstrated by the fact that it reduces to Block Kriging in the case of unbiased samples; to ratio estimator if there is no correlation between hospitals; and to simple statistic if the hospital records are neither biased nor space-time correlated. In addition to the theoretical advantages of the B-SHADE technique over the two other methods above, two real world case studies (hand-foot-mouth disease and fever syndrome cases) demonstrated its empirical superiority, as well.
Reis B, Kirby C, Hadden L, Olson K, McMurry A, Daniel J, Mandl K. AEGIS: a robust and scalable real-time public health surveillance system.. J Am Med Inform Assoc. 2007;14(5):581–8. doi:10.1197/jamia.M2342
In this report, we describe the Automated Epidemiological Geotemporal Integrated Surveillance system (AEGIS), developed for real-time population health monitoring in the state of Massachusetts. AEGIS provides public health personnel with automated near-real-time situational awareness of utilization patterns at participating healthcare institutions, supporting surveillance of bioterrorism and naturally occurring outbreaks. As real-time public health surveillance systems become integrated into regional and national surveillance initiatives, the challenges of scalability, robustness, and data security become increasingly prominent. A modular and fault tolerant design helps AEGIS achieve scalability and robustness, while a distributed storage model with local autonomy helps to minimize risk of unauthorized disclosure. The report includes a description of the evolution of the design over time in response to the challenges of a regional and national integration environment.