Publications

R

Lober, Karras, Wagner, Overhage, Davidson, Fraser, Trigg, Mandl, Espino, Tsui. Roundtable on bioterrorism detection: information system-based surveillance. J Am Med Inform Assoc. 2002;9:105–15.
During the 2001 AMIA Annual Symposium, the Anesthesia, Critical Care, and Emergency Medicine Working Group hosted the Roundtable on Bioterrorism Detection. Sixty-four people attended the roundtable discussion, during which several researchers discussed public health surveillance systems designed to enhance early detection of bioterrorism events. These systems make secondary use of existing clinical, laboratory, paramedical, and pharmacy data or facilitate electronic case reporting by clinicians. This paper combines case reports of six existing systems with discussion of some common techniques and approaches. The purpose of the roundtable discussion was to foster communication among researchers and promote progress by 1) sharing information about systems, including origins, current capabilities, stages of deployment, and architectures; 2) sharing lessons learned during the development and implementation of systems; and 3) exploring cooperation projects, including the sharing of software and data. A mailing list server for these ongoing efforts may be found at http://bt.cirg.washington.edu.
Brownstein, Cassa, Kohane, Mandl. Reverse geocoding: concerns about patient confidentiality in the display of geospatial health data. AMIA Annu Symp Proc. 2005:905.
Widespread availability geographic information systems (GIS) software has facilitated the use health mapping in both academia and government. Maps that display patients as points are often exchanged in public forums (journals, meetings, web). However,even these low resolution maps may reveal confidential patient location information. In this report, we describe a method to test whether privacy is being breached. We reverse geocode from maps with cases and describe the accuracy with which patient addresses can be extracted.
Wieland, Cassa, Mandl, Berger. Revealing the spatial distribution of a disease while preserving privacy. Proc Natl Acad Sci U S A. 2008;105:17608–13.
Datasets describing the health status of individuals are important for medical research but must be used cautiously to protect patient privacy. For patient data containing geographical identifiers, the conventional solution is to aggregate the data by large areas. This method often preserves privacy but suffers from substantial information loss, which degrades the quality of subsequent disease mapping or cluster detection studies. Other heuristic methods for de-identifying spatial patient information do not quantify the risk to individual privacy. We develop an optimal method based on linear programming to add noise to individual locations that preserves the distribution of a disease. The method ensures a small, quantitative risk of individual re-identification. Because the amount of noise added is minimal for the desired degree of privacy protection, the de-identified set is ideal for spatial epidemiological studies. We apply the method to patients in New York County, New York, showing that privacy is guaranteed while moving patients 25-150 times less than aggregation by zip code.
Bourgeois, Valim, McAdam, Mandl. Relative impact of influenza and respiratory syncytial virus in young children. Pediatrics. 2009;124:e1072–80.
OBJECTIVE: We measured the relative impact of influenza and respiratory syncytial virus (RSV) infections in young children in terms of emergency department (ED) visits, clinical care requirements, and overall resource use. METHODS: Patients who were aged
In the wake of fears over pandemic influenza, triggered by concern about avian influenza, a top national priority is to adapt surveillance systems, such as Biosense, for influenza monitoring. While real time surveillance system architects have been largely focused on the problem of discrete outbreak detection, the data in these systems have been shown to have unique advantages for the timely detection of influenza. In this study, we evaluate the utility influenza detection by real-time surveillance as an adjunct to the traditional CDC surveillance systems.
Olson, Bonetti, Pagano, Mandl. Real time spatial cluster detection using interpoint distances among precise patient locations. BMC Med Inform Decis Mak. 2005;5:19.
BACKGROUND: Public health departments in the United States are beginning to gain timely access to health data, often as soon as one day after a visit to a health care facility. Consequently, new approaches to outbreak surveillance are being developed. When cases cluster geographically, an analysis of their spatial distribution can facilitate outbreak detection. Our method focuses on detecting perturbations in the distribution of pair-wise distances among all patients in a geographical region. Barring outbreaks, this distribution can be quite stable over time. We sought to exemplify the method by measuring its cluster detection performance, and to determine factors affecting sensitivity to spatial clustering among patients presenting to hospital emergency departments with respiratory syndromes. METHODS: The approach was to (1) define a baseline spatial distribution of home addresses for a population of patients visiting an emergency department with respiratory syndromes using historical data; (2) develop a controlled feature set simulation by inserting simulated outbreak data with varied parameters into authentic background noise, thereby creating semisynthetic data; (3) compare the observed with the expected spatial distribution; (4) establish the relative value of different alarm strategies so as to maximize sensitivity for the detection of clustering; and (5) measure factors which have an impact on sensitivity. RESULTS: Overall sensitivity to detect spatial clustering was 62%. This contrasts with an overall alarm rate of less than 5% for the same number of extra visits when the extra visits were not characterized by geographic clustering. Clusters that produced the least number of alarms were those that were small in size (10 extra visits in a week, where visits per week ranged from 120 to 472), diffusely distributed over an area with a 3 km radius, and located close to the hospital (5 km) in a region most densely populated with patients to this hospital. Near perfect alarm rates were found for clusters that varied on the opposite extremes of these parameters (40 extra visits, within a 250 meter radius, 50 km from the hospital). CONCLUSION: Measuring perturbations in the interpoint distance distribution is a sensitive method for detecting spatial clustering. When cases are clustered geographically, there is clearly power to detect clustering when the spatial distribution is represented by the M statistic, even when clusters are small in size. By varying independent parameters of simulated outbreaks, we have demonstrated empirically the limits of detection of different types of outbreaks.
BACKGROUND: Knowledge of the geographical locations of individuals is fundamental to the practice of spatial epidemiology. One approach to preserving the privacy of individual-level addresses in a data set is to de-identify the data using a non-deterministic blurring algorithm that shifts the geocoded values. We investigate a vulnerability in this approach which enables an adversary to re-identify individuals using multiple anonymized versions of the original data set. If several such versions are available, each can be used to incrementally refine estimates of the original geocoded location. RESULTS: We produce multiple anonymized data sets using a single set of addresses and then progressively average the anonymized results related to each address, characterizing the steep decline in distance from the re-identified point to the original location, (and the reduction in privacy). With ten anonymized copies of an original data set, we find a substantial decrease in average distance from 0.7 km to 0.2 km between the estimated, re-identified address and the original address. With fifty anonymized copies of an original data set, we find a decrease in average distance from 0.7 km to 0.1 km. CONCLUSION: We demonstrate that multiple versions of the same data, each anonymized by non-deterministic Gaussian skew, can be used to ascertain original geographic locations. We explore solutions to this problem that include infrastructure to support the safe disclosure of anonymized medical data to prevent inference or re-identification of original address data, and the use of a Markov-process based algorithm to mitigate this risk.

P

Simons, Mandl, Kohane. The PING personally controlled electronic medical record system: technical architecture. J Am Med Inform Assoc. 2005;12:47–54.
Despite progress in creating standardized clinical data models and interapplication protocols, the goal of creating a lifelong health care record remains mired in the pragmatics of interinstitutional competition, concerns about privacy and unnecessary disclosure, and the lack of a nationwide system for authenticating and authorizing access to medical information. The authors describe the architecture of a personally controlled health care record system, PING, that is not institutionally bound, is a free and open source, and meets the policy requirements that the authors have previously identified for health care delivery and population-wide research.
Riva, Mandl, Oh, Nigrin, Butte, Szolovits, Kohane. The personal internetworked notary and guardian. Int J Med Inform. 2001;62:27–40.
In this paper, we propose a secure, distributed and scaleable infrastructure for a lifelong personal medical record system. We leverage on existing and widely available technologies, like the Web and public-key cryptography, to define an architecture that allows patients to exercise full control over their medical data. This is done without compromising patients' privacy and the ability of other interested parties (e.g. physicians, health-care institutions, public-health researchers) to access the data when appropriately authorized. The system organizes the information as a tree of encrypted plain-text XML files, in order to ensure platform independence and durability, and uses a role-based authorization scheme to assign access privileges. In addition to the basic architecture, we describe tools to populate the patient's record with data from hospital databases and the first testbed applications we are deploying.