Publications

2026

Wang, Yiyu, Selen Bozkurt, Nathan Le, Aishwarya Alagappan, Cho-Yi Huang, Swati Rajwal, Ashley Lewis, Jiyeong Kim, and Titilola Falasinnu. (2026) 2026. “Extracting Patient Reported Cannabis Use and Reasons for Use from Electronic Health Records: A Benchmarking Study of Large Language Models.”. MedRxiv : The Preprint Server for Health Sciences. https://doi.org/10.64898/2026.03.06.26347824.

Publisher's Version

KEY MESSAGES: What is already known on this topic Cannabis use is increasingly documented in EHR narrative text, but structured fields do not capture use status or symptom related motivations, limiting research on pain self-management strategies in autoimmune rheumatic diseases. What this study adds We developed a natural language processing pipeline using large language models to identify cannabis use status (4-class) and reasons for use (6-class), with best performance from fine-tuned GatorTron for status and an LLM for reasons. How this study might affect research, practice or policy This scalable approach can support real-world evidence studies on symptom management, medication use, and outcomes among ARD populations, and it provides a methodological template for extracting under-documented patient behaviors from narrative notes.

OBJECTIVE: The primary objective is to develop and evaluate a scalable and reproducible natural language processing (NLP) approach using large language models (LLM) to identify cannabis use status and reasons for cannabis use among patients with autoimmune rheumatic diseases (ARDs) from unstructured electronic health record (EHR) clinical notes.

METHODS AND ANALYSIS: We conducted a retrospective study using EHR clinical notes from patients with ARDs (2015-2024). Notes were screened for cannabis-related mentions using fuzzy string matching against a curated keyword lexicon with a similarity threshold of 90, extracting 50-word context windows (±25 words). Two domain experts annotated 886 randomly sampled snippets across four classes: (1) not a true cannabis mention/uncertain, (2) denial of use, (3) positive past use, and (4) positive current use. Using these annotations, we compared multiple LLM prompting strategies (zero-shot to few-shot; temperature tuning) and a fine-tuned clinical model (GatorTron 345M). For "reason for use," 1,027 snippets were annotated into six categories: pain, nausea, sleep, anxiety/stress/mood, appetite, and not mentioned/unknown. Models were evaluated on a held-out validation set using accuracy, F1, recall, and precision. We then aggregated snippet-level predictions to patient level to describe temporal trends and subgroup differences.

RESULTS: For cannabis use status classification, the fine-tuned GatorTron model achieved the highest performance (accuracy 0.90; F1 0.91; recall 0.90; precision 0.90). For the reason of cannabis use classification, GPT-OSS-20B achieved the highest performance (accuracy 0.90; F1 0.90; recall 0.90; precision 0.92). Patient-level analyses characterized trends in documented cannabis use from 2015-2024 and compared clinical characteristics between current users and patients denying use.

CONCLUSION: High-precision extraction of cannabis use status and reasons for use from EHR notes is feasible using a combination of fine-tuned clinical language models and LLM-based classifiers. This approach enables scalable measurement of patient-reported symptom self-management strategies in ARDs, supporting observational research and potential clinical decision support.

Rajwal, Swati, Avinash Kumar Pandey, Ziyuan Zhang, Yankai Chen, Michael X Liu, Sudeshna Das, Hannah Rogers, Abeed Sarker, and Yunyu Xiao. (2026) 2026. “Applications of Natural Language Processing and Large Language Models for Social Determinants of Health: Systematic Review.”. Journal of Medical Internet Research 28: e83793. https://doi.org/10.2196/83793.

Publisher's Version

BACKGROUND: Social determinants of health (SDOH) are the social, economic, and environmental conditions that influence health outcomes. SDOH information is often embedded in unstructured text, such as notes in electronic health records and social media posts. Advances in natural language processing (NLP), including emergent large language models (LLMs), offer opportunities to extract, analyze, and interpret SDOH expressions from free text for inclusion in downstream analyses. Existing literature on NLP applications for SDOH is dispersed across disciplines and characterized by methodological heterogeneity and variability in study quality and scope, complicating synthesis and cross-study comparison.

OBJECTIVE: This study aimed to examine the use of NLP, including LLMs, in SDOH research, and highlight gaps and future research directions.

METHODS: We conducted a systematic review following PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines, searching 7 major databases for publications between 2014 and November 2025. We included journal and conference proceedings papers that applied NLP methods to identify, classify, extract, or predict SDOH from text. Three reviewers independently screened studies and extracted data; conflicts were resolved by two senior reviewers. We abstracted study metadata, dataset characteristics, NLP approaches, SDOH domains addressed, and NLP performance metrics. We also conducted risk-of-bias analyses and identified influential studies based on relative citation counts.

RESULTS: 142 studies met the inclusion criteria. Nearly two-thirds (89/142, 62.7%) were published between 2023 and 2025, reflecting rapid recent growth. Most studies relied on electronic health records (93/142, 65.5%) and private datasets (81/142, 57.0%), while only 20.4% (29/142) used publicly available data. Commonly studied SDOH domains were housing instability (72/142, 50.7%), employment (65/142, 45.8%), and financial conditions (63/142, 44.4%); structural factors, such as immigration status (5/142, 3.5%), were rarely examined. Of studies that reported evaluation metrics, most focused on classification (26/83, 31.32%) or extraction (38/83, 45.7%), and used cross-sectional designs. Reported model performances were typically strong, with median F1-scores ranging roughly from 0.75 to 0.85 across model categories. Only 49 studies shared code, and fewer than half clearly described model interpretability or reproducibility practices. LLMs (including encoder-decoder models) appeared in 19.7% (28/142) of studies, highlighting emerging interest but also raising new concerns around transparency and governance.

CONCLUSIONS: This review provides a timely synthesis of NLP and LLM applications across the SDOH research spectrum, addressing an important gap in a topic receiving increasing research attention. By comparing task formulations, data sources, and performance patterns, the review clarifies the research readiness of current approaches and reveals critical gaps. Our findings advance the field by highlighting the absence of a unified SDOH framework, uneven availability of public benchmarks, and limited evaluation of real-world deployment. Addressing these gaps through transparent, inclusive dataset development and implementation-focused evaluation is essential for translating NLP advances into equitable, real-world health impact.

2025

Rajwal, Swati, and Avinash Kumar Pandey. 2025. “Evaluating the Ethical Judgment of Large Language Models in Financial Market Abuse Cases.”

Publisher's Version

Klein, AZ, T Dasgupta, Flores Amaro, S Jana, S Khademi, G Lopez-Garcia, T Onishi, et al. 2025. “Overview of the 10th Social Media Mining for Health (#SMM4H) and Health Real-World Data (HeaRD) Shared Tasks at ICWSM 2025”. 19th International AAAI Conference on Web and Social Media.

Walker, Andrew, Jerik Leung, Aishwarya Alagappan, Swati Rajwal, Sahithi Lakamana, Tricia Park, Nathan Le, et al. (2025) 2025. “Centering Patient Voices in Lupus Pain: A Biopsychosocial Analysis of Reddit Narratives Using Large Language Models.”. Arthritis Care & Research. https://doi.org/10.1002/acr.25687.

Publisher's Version

OBJECTIVE: Patients with chronic illness share their experiences in online communities, generating rich data on pain management. This study applied natural language processing methods, including large language models, to Reddit discussions from lupus communities to characterize multidimensional pain experiences framed in the biopsychosocial model.

METHODS: We extracted Reddit posts from the r/Lupus and r/LupusSupport subreddits posted from June 9, 2010 through December 31, 2023. Pain-related posts were identified using a clinically informed pain lexicon. Topic modeling was used to identify thematic patterns, which were then compared to structured summaries generated by an LLM instruction fine-tuned using the biopsychosocial model of pain. Two reviewers conducted content analysis of the LLM-generated summaries, evaluating thematic accuracy and coverage.

RESULTS: Data from Reddit included 31,785 posts, from 10,857 authors. We identified common pain complaints, management strategies, and sociocultural, affective, and nociplastic dimensions of pain. Instruction fine-tuned LLMs produced structured summaries with an average thematic accuracy score of 3.1 out of 4 (kappa = .09) and content coverage score of 2.9 out of 4 (kappa = .38). Sociocultural features presented in 123 posts (33.8%), including peer support and validation (n=106) and provider interactions or access issues (n=35). Nociplastic pain presented in 205 posts (56.3%).

CONCLUSION: NLP methods can be used to extract rich, multidimensional insights about pain experiences from online communities focused on lupus. These approaches highlight the psychological, social, and cultural facets of pain that may be underrepresented in clinical settings, supporting more patient-centered approaches to care in rheumatology.

Walker, Drew, Swati Rajwal, Sudeshna Das, Snigdha Peddireddy, and Abeed Sarker. 2025. “Identifying Social Isolation Themes in NVDRS Text Narratives Using Topic Modeling and Text-Classification Methods”. ArXiv Preprint ArXiv:2506.15030.

Rajwal, Swati, and Avinash Kumar Pandey. 2025. “Connecting With Your Future Professor: A Practical Guide”. XRDS: Crossroads, The ACM Magazine for Students 31 (3): 10-11.

Salazar, Israfel, Manuel Fern\ andez Burda, Shayekh Bin Islam, Arshia Soltani Moakhar, Shivalika Singh, Fabian Farestam, Angelika Romanou, et al. 2025. “Kaleidoscope: In-Language Exams for Massively Multilingual Vision Evaluation”. ArXiv Preprint ArXiv:2504.07072.

Ge, Yao, Yuting Guo, Sudeshna Das, Swati Rajwal, Selen Bozkurt, and Abeed Sarker. 2025. “HILGEN: Hierarchically-Informed Data Generation for Biomedical NER Using Knowledgebases and Large Language Models”. ArXiv Preprint ArXiv:2503.04930.

Rajwal, Swati, Ziyuan Zhang, Yankai Chen, Hannah Rogers, Abeed Sarker, Yunyu Xiao, and . 2025. “Applications of Natural Language Processing and Large Language Models for Social Determinants of Health: Protocol for a Systematic Review”. JMIR Research Protocols 14 (1): e66094.