Abstract
Introduction: Timely thrombolytic therapy improves outcomes in acute ischemic stroke. Manual chart review to screen for thrombolysis contraindications may be time-consuming and prone to errors. We developed and tested a large language model (LLM)-based tool to identify thrombolysis contraindications from clinical notes using synthetic data in a proof-of-concept study. Methods: We generated 150 synthetic clinical notes containing randomly assigned thrombolysis contraindications using LLMs. We then used Llama 3.1 405B with a custom prompt to generate a list of thrombolysis contraindications from each note. Performance was evaluated using sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), accuracy, and F1 score. Results: A total of 150 synthetic notes were generated using five different models: ChatGPT-4o, Llama 3.1 405B, Llama 3.1 70B, ChatGPT-4o mini, and Gemini 1.5 Flash. On average, each note contained 241.6 words (SD 110.7; range 80–549) and included 1.5 contraindications (SD 1.1; range 0–5). Our tool achieved a sensitivity of 90.9% (95% CI: 86.3%–94.3%), specificity of 99.2% (95% CI: 98.8%–99.5%), PPV of 87.7% (95% CI: 82.7%–91.7%), NPV of 99.4% (95% CI: 99.1%–99.6%), accuracy of 98.7% (95% CI: 98.2%–99.0%), and an F1 score of 0.892. Among the false positives, 24 (86%) were due to the inclusion of irrelevant contraindications, and 4 (14%) resulted from repetitive information. No hallucinations were observed. Conclusion: Our LLM-based tool may identify stroke thrombolysis contraindications from synthetic clinical notes with high sensitivity and PPV. Future studies will validate its performance using real EMR data and integrate it into acute stroke workflows to facilitate faster and safer thrombolysis decision-making.
Introduction
Timely stroke thrombolysis is associated with better outcomes in eligible patients [1, 2]. A key component of eligibility determination is the manual review of patients’ previous electronic medical records (EMRs) to screen for thrombolysis contraindications. This task may be time-consuming and prone to errors, potentially leading to treatment delays, inappropriate thrombolysis due to missed contraindications, or withholding of thrombolysis due to wrongfully identified contraindications. Previous studies showed that clinicians spent 2–7 min per patient on EMR review and missed 20–43% of thrombolysis contraindications [3, 4]. Recent advancements in large language models (LLM) represent a significant breakthrough in artificial intelligence and machine learning [5]. Trained on extensive textual datasets, LLMs exhibit strong capabilities in natural language processing and general reasoning, with medical applications such as diagnostic reasoning, treatment recommendations, knowledge augmentation, and clinical data summarization [6‒11]. LLMs may offer a novel opportunity to address the gap in acute stroke care related to the timely and accurate identification of thrombolysis contraindications.
This study aimed to develop a proof-of-concept LLM-based tool for the identification of stroke thrombolysis contraindications within clinical notes and test its preliminary performances using synthetic clinical notes. This tool could be used for real-time processing of a patient’s previous EMR clinical notes and to quickly generate a list of thrombolysis contraindications to enhance the accuracy and timeliness of stroke thrombolysis decision-making.
Methods
Synthetic clinical notes were generated to evaluate the LLM-based tool’s performance in identifying stroke thrombolysis contraindications (online suppl. Fig. S1; for all online suppl. material, see https://doi.org/10.1159/000545317). As no real patient data were used, IRB approval was unnecessary. Data and prompts are available upon request. This study complies with the Standards for Reporting Diagnostic Accuracy Studies (STARD) guidelines [12].
Synthetic Clinical Notes
In order to mimic typical clinical notes within patients’ EMR prior to acute stroke evaluations (e.g., outpatient visit notes, operative reports, admission notes, progress notes, discharge summaries), we generated 150 synthetic clinical notes using five LLMs under default settings: ChatGPT-4o (OpenAI), Llama 3.1 70B and 405B (Meta), ChatGPT-4o mini (OpenAI), and Gemini 1.5 Flash (Google). The use of five different LLMs served to provide stylistic variety to the synthetic notes. These LLMs were stratified into three tiers based on parameter size: tier 1 (ChatGPT-4o, Llama 3.1 405B), tier 2 (Llama 3.1 70B), and tier 3 (ChatGPT-4o mini, Gemini 1.5 Flash). To emphasize higher quality notes from larger models, tier 1 generated 60 notes, tier 2 produced 50, and tier 3 contributed 40, with generation quotas predetermined prior to study initiation. A standardized prompt and a random number generator were employed to guide each LLM in generating random synthetic notes (online suppl. Table S1). The prompt specified predetermined variables, including patient age (51–100 years), sex, number of contraindications (0–3), and their nature, if applicable, to which the number generator assigned a random value. The encounter date was uniformly set one day before contraindication identification, ensuring consistency across generated notes. Each synthetic note was reviewed by a vascular neurologist to identify potential contraindications from a predefined list of 24 thrombolysis contraindications based on the American Heart Association stroke guidelines [13] (online suppl. Table S1). A contraindication was considered present if the documentation necessitated clarification or confirmation with the patient before thrombolysis administration. Identified contraindications were compared with the predefined set specified in the prompt, accounting for potential discrepancies. A final curated list of contraindications for each note was established as the ground truth for performance evaluation.
Identification of Contraindications
We developed an LLM-based tool, named NeuroGlimpse, for identifying stroke thrombolysis contraindications from clinical notes. This tool was powered by Llama 3.1 405B, an open-source LLM with performance comparable to ChatGPT-4o [14], in its default settings (temperature 0.2, top P 0.7, frequency and presence penalties at 0, token limit of 1,024). This LLM was accessed via an Nvidia cloud platform. A custom prompt, refined iteratively and exceeding 3,000 words, provided detailed instructions, output formats, and thrombolysis-specific clinical knowledge (online suppl. Table S2). The input consisted of this prompt and a synthetic note, and the output was a single list of thrombolysis contraindications, including the reason for contraindication, its classification (absolute or relative), pertinent clinical information, and the date and author of the source note (online suppl. Table S3). No modification to the original LLM or preprocessing of inputs or outputs was needed.
Performance Measures
We evaluated the performance of NeuroGlimpse using the previously established ground truth. Performance evaluation included (1) number of correctly identified contraindications (true positives); (2) number of missed contraindications (false negatives); and (3) number of false positives (FPs). FPs consisted of irrelevant contraindications (items not present in the ground truth), repetitive clinical information (multiple items describing the same contraindication), and hallucinations [5] (factually incorrect or fabricated clinical information). True negatives (TNs) were contraindications not present in a clinical note and not reported by NeuroGlimpse. Given that each clinical note could contain up to 24 contraindications, a note without any contraindications was counted as 24 TNs. The primary performance measures were sensitivity and positive predictive value (PPV). Secondary measures included specificity, negative predictive value (NPV), accuracy, and F1 score.
Statistical Analyses
Sensitivity, specificity, PPV, NPV, accuracy, and F1 score were calculated using standard definitions based on true positive, FP, false negative, and TN. Clopper-Pearson exact methods were employed to calculate 95% confidence intervals (CIs) for sensitivity, specificity, PPV, NPV, and accuracy. Two-sample tests for comparison of binomial proportions were utilized to compare primary performance measures across subgroups defined by sex and the number of contraindications per note. Chi-square tests were applied to evaluate differences in primary performance measures among the 24 contraindications, under the null hypothesis of equal sensitivity and PPV across all contraindications. All statistical tests were two-tailed, and a significance level of α = 0.05 was used throughout. Statistical analyses were performed using Microsoft Excel version 16.90.
Results
Baseline Characteristics
A total of 150 synthetic clinical notes were generated using various LLMs (Table 1). Specifically, 40 notes (26.7%) were produced by ChatGPT-4o, 20 (13.3%) by Llama 3.1 405B, 50 (33.3%) by Llama 3.1 70B, 20 (13.3%) by ChatGPT-4o mini, and 20 (13.3%) by Gemini 1.5 Flash. Each note averaged 241.6 words (SD 110.7; range 80–549) and included an average of 1.5 contraindications (SD 1.1; range 0–5). Among these notes, 89 (59.3%) contained 0–1 contraindications, while 61 (40.7%) included 2–5 contraindications. The synthetic patient cohort had a mean age of 73.3 years (SD 14.7; range 51–100), with 72 patients (48.0%) being female.
Baseline characteristics from 150 synthetic notes
Characteristics . | . |
---|---|
Age, mean (SD), range, years | 73.3 (14.7), 51–100 |
Female, n (%) | 72 (48.0) |
LLM used for note generation, n (%) | |
ChatGPT-4o | 40 (26.7) |
Llama 3.1 405B | 20 (13.3) |
Llama 3.1 70B | 50 (33.3) |
ChatGPT-4o mini | 20 (13.3) |
Gemini 1.5 Flash | 20 (13.3) |
Words per note, mean (SD), range, n | 241.6 (110.7), 80–549 |
Contraindications per note, mean (SD), range, n | 1.5 (1.1), 0–5 |
Notes with 0 contraindication, n (%) | 26 (17.3) |
Notes with 1 contraindication, n (%) | 63 (42.0) |
Notes with 2 contraindications, n (%) | 35 (23.3) |
Notes with 3 contraindications, n (%) | 20 (13.3) |
Notes with 4 contraindications, n (%) | 4 (2.7) |
Notes with 5 contraindications, n (%) | 2 (1.3) |
Characteristics . | . |
---|---|
Age, mean (SD), range, years | 73.3 (14.7), 51–100 |
Female, n (%) | 72 (48.0) |
LLM used for note generation, n (%) | |
ChatGPT-4o | 40 (26.7) |
Llama 3.1 405B | 20 (13.3) |
Llama 3.1 70B | 50 (33.3) |
ChatGPT-4o mini | 20 (13.3) |
Gemini 1.5 Flash | 20 (13.3) |
Words per note, mean (SD), range, n | 241.6 (110.7), 80–549 |
Contraindications per note, mean (SD), range, n | 1.5 (1.1), 0–5 |
Notes with 0 contraindication, n (%) | 26 (17.3) |
Notes with 1 contraindication, n (%) | 63 (42.0) |
Notes with 2 contraindications, n (%) | 35 (23.3) |
Notes with 3 contraindications, n (%) | 20 (13.3) |
Notes with 4 contraindications, n (%) | 4 (2.7) |
Notes with 5 contraindications, n (%) | 2 (1.3) |
Thrombolysis Contraindications
Contraindications were randomly assigned during the synthetic note generation process. However, the predetermined set of contraindications specified in the prompt may have differed from manually established ground truth, resulting in an uneven distribution across the notes (Table 2). The most frequently appearing contraindications were major surgery within the prior 14 days (n = 31), gastrointestinal bleeding within the prior 21 days (n = 19), and recent or current use of warfarin (n = 17). The least frequent contraindications included recent or ongoing internal bleeding (n = 2), serious trauma within the prior 14 days (n = 2), gastrointestinal malignancy (n = 3), and intra-axial intracranial neoplasm (n = 3). In total, 219 contraindications were incorporated within these notes.
Thrombolysis contraindications and performance measures of NeuroGlimpse
Contraindication ID . | Description . | n (%) . | TP . | FP . | FN . | Sensitivity (%)a . | PPV (%)a . |
---|---|---|---|---|---|---|---|
1 | Recent or ongoing internal bleeding | 2 (0.9) | 1 | 0 | 1 | 50.0 | 100.0 |
2 | Intracranial or intraspinal surgery within the prior 3 months | 12 (5.5) | 11 | 1 | 1 | 91.7 | 91.7 |
3 | Ischemic stroke within the prior 3 months | 9 (4.1) | 5 | 0 | 4 | 55.6 | 100.0 |
4 | Significant head trauma within the prior 3 months | 8 (3.7) | 8 | 0 | 0 | 100.0 | 100.0 |
5 | History of previous unprovoked intracranial hemorrhage | 8 (3.7) | 8 | 0 | 0 | 100.0 | 100.0 |
6 | History of thrombocytopenia | 8 (3.7) | 8 | 1 | 0 | 100.0 | 88.9 |
7 | Recent or current use of warfarin | 17 (7.8) | 17 | 0 | 0 | 100.0 | 100.0 |
8 | Recent or current use of IV heparin | 5 (2.3) | 5 | 0 | 0 | 100.0 | 100.0 |
9 | Use of direct oral anticoagulants within last 48 h | 13 (5.9) | 12 | 0 | 1 | 92.3 | 100.0 |
10 | Use of LMWH within last 48 h | 5 (2.3) | 5 | 2 | 0 | 100.0 | 71.4 |
11 | GI malignancy | 3 (1.4) | 2 | 0 | 1 | 66.7 | 100.0 |
12 | GI bleed within the prior 21 days | 19 (8.7) | 15 | 0 | 4 | 78.9 | 100.0 |
13 | Intra-axial intracranial neoplasm | 3 (1.4) | 1 | 0 | 2 | 33.3 | 100.0 |
14 | Infective endocarditis | 4 (1.8) | 3 | 0 | 1 | 75.0 | 100.0 |
15 | Aortic arch dissection | 5 (2.3) | 4 | 0 | 1 | 80.0 | 100.0 |
16 | Suspicion of SAH | 11 (5.0) | 11 | 0 | 0 | 100.0 | 100.0 |
17 | Major surgery within the prior 14 days | 31 (14.2) | 28 | 8 | 3 | 90.3 | 77.8 |
18 | Serious trauma within the prior 14 days | 2 (0.9) | 2 | 2 | 0 | 100.0 | 50.0 |
19 | Arterial puncture at noncompressible site within the prior 7 days | 7 (3.2) | 6 | 0 | 1 | 85.7 | 100.0 |
20 | Intracranial vascular malformation | 5 (2.3) | 5 | 2 | 0 | 100.0 | 71.4 |
21 | Large unruptured aneurysm | 10 (4.6) | 10 | 0 | 0 | 100.0 | 100.0 |
22 | Intracranial arterial dissection | 6 (2.7) | 6 | 0 | 0 | 100.0 | 100.0 |
23 | Prior history of greater than 10 cerebral microbleeds | 11 (5.0) | 11 | 1 | 0 | 100.0 | 91.7 |
24 | Systemic malignancy with a life expectancy of less than 6 months | 15 (6.8) | 15 | 11 | 0 | 100.0 | 57.7 |
Total | - | 219 (100.0) | 199 | 28 | 20 | 90.9 | 87.7 |
Contraindication ID . | Description . | n (%) . | TP . | FP . | FN . | Sensitivity (%)a . | PPV (%)a . |
---|---|---|---|---|---|---|---|
1 | Recent or ongoing internal bleeding | 2 (0.9) | 1 | 0 | 1 | 50.0 | 100.0 |
2 | Intracranial or intraspinal surgery within the prior 3 months | 12 (5.5) | 11 | 1 | 1 | 91.7 | 91.7 |
3 | Ischemic stroke within the prior 3 months | 9 (4.1) | 5 | 0 | 4 | 55.6 | 100.0 |
4 | Significant head trauma within the prior 3 months | 8 (3.7) | 8 | 0 | 0 | 100.0 | 100.0 |
5 | History of previous unprovoked intracranial hemorrhage | 8 (3.7) | 8 | 0 | 0 | 100.0 | 100.0 |
6 | History of thrombocytopenia | 8 (3.7) | 8 | 1 | 0 | 100.0 | 88.9 |
7 | Recent or current use of warfarin | 17 (7.8) | 17 | 0 | 0 | 100.0 | 100.0 |
8 | Recent or current use of IV heparin | 5 (2.3) | 5 | 0 | 0 | 100.0 | 100.0 |
9 | Use of direct oral anticoagulants within last 48 h | 13 (5.9) | 12 | 0 | 1 | 92.3 | 100.0 |
10 | Use of LMWH within last 48 h | 5 (2.3) | 5 | 2 | 0 | 100.0 | 71.4 |
11 | GI malignancy | 3 (1.4) | 2 | 0 | 1 | 66.7 | 100.0 |
12 | GI bleed within the prior 21 days | 19 (8.7) | 15 | 0 | 4 | 78.9 | 100.0 |
13 | Intra-axial intracranial neoplasm | 3 (1.4) | 1 | 0 | 2 | 33.3 | 100.0 |
14 | Infective endocarditis | 4 (1.8) | 3 | 0 | 1 | 75.0 | 100.0 |
15 | Aortic arch dissection | 5 (2.3) | 4 | 0 | 1 | 80.0 | 100.0 |
16 | Suspicion of SAH | 11 (5.0) | 11 | 0 | 0 | 100.0 | 100.0 |
17 | Major surgery within the prior 14 days | 31 (14.2) | 28 | 8 | 3 | 90.3 | 77.8 |
18 | Serious trauma within the prior 14 days | 2 (0.9) | 2 | 2 | 0 | 100.0 | 50.0 |
19 | Arterial puncture at noncompressible site within the prior 7 days | 7 (3.2) | 6 | 0 | 1 | 85.7 | 100.0 |
20 | Intracranial vascular malformation | 5 (2.3) | 5 | 2 | 0 | 100.0 | 71.4 |
21 | Large unruptured aneurysm | 10 (4.6) | 10 | 0 | 0 | 100.0 | 100.0 |
22 | Intracranial arterial dissection | 6 (2.7) | 6 | 0 | 0 | 100.0 | 100.0 |
23 | Prior history of greater than 10 cerebral microbleeds | 11 (5.0) | 11 | 1 | 0 | 100.0 | 91.7 |
24 | Systemic malignancy with a life expectancy of less than 6 months | 15 (6.8) | 15 | 11 | 0 | 100.0 | 57.7 |
Total | - | 219 (100.0) | 199 | 28 | 20 | 90.9 | 87.7 |
GI, gastrointestinal; IV, intravenous; LMWH, low-molecular-weight heparin; SAH, subarachnoid hemorrhage; TP, true positive; FN, false negative.
aChi-squared tests were carried out to test whether sensitivity and PPV varied by contraindication, assuming an expectation of equal sensitivity and PPV for all contraindications: sensitivity, p = 1.00; PPV, p < 0.0001.
Overall Performance
When evaluating all 219 contraindications collectively, NeuroGlimpse demonstrated strong performance metrics (Table 3): sensitivity of 90.9% (95% CI, 86.3–94.3%), specificity of 99.2% (95% CI, 98.8–99.5%), PPV of 87.7% (95% CI, 82.7–91.7%), NPV of 99.4% (95% CI, 99.1–99.6%), accuracy of 98.7% (95% CI, 98.2–99.0%), and an F1 score of 0.892. The primary reason for FPs was the identification of irrelevant contraindications (n = 24, 86%), such as interpreting documentation like “no clear evidence of malignancy but advanced age and multiple comorbidities may suggest limited life expectancy” as contraindications. The remaining FPs were due to repetitive clinical information (n = 4, 14%). No instances of hallucinations were observed in any of the NeuroGlimpse outputs (Table 4).
Overall performance of NeuroGlimpse
Performance measures . | % (95% CI) . |
---|---|
Sensitivity | 90.9 (86.3–94.3) |
Specificity | 99.2 (98.8–99.5) |
PPV | 87.7 (82.7–91.7) |
NPV | 99.4 (99.1–99.6) |
Accuracy | 98.7 (98.2–99.0) |
F1 score | 0.892 (−) |
Performance measures . | % (95% CI) . |
---|---|
Sensitivity | 90.9 (86.3–94.3) |
Specificity | 99.2 (98.8–99.5) |
PPV | 87.7 (82.7–91.7) |
NPV | 99.4 (99.1–99.6) |
Accuracy | 98.7 (98.2–99.0) |
F1 score | 0.892 (−) |
Reasons for FP results
Reason . | n (%) . |
---|---|
Irrelevant contraindication | 24 (85.7) |
Repetitive clinical information | 4 (14.3) |
Hallucination or factually false information | 0 (0.0) |
Total | 28 (100.0) |
Reason . | n (%) . |
---|---|
Irrelevant contraindication | 24 (85.7) |
Repetitive clinical information | 4 (14.3) |
Hallucination or factually false information | 0 (0.0) |
Total | 28 (100.0) |
Subgroup Comparisons of Performance
Sensitivity and PPV varied numerically across different contraindications (Table 2). Sensitivity ranged from 33.3% for intra-axial intracranial neoplasm to 100.0% for multiple contraindications, while PPV ranged from 50.0% for serious trauma within the prior 14 days to 100.0% for multiple contraindications. Chi-squared tests, assuming equal sensitivity and PPV for all contraindications, revealed no significant variation in sensitivity (p = 1.00) but a significant variation in PPV (p < 0.0001), with systematic malignancy (n = 11) representing 39.3% of all FPs.
Comparing performance between male and female patients showed no significant differences (online suppl. Table S4). For male patients, the sensitivity was 87.3% (95% CI, 79.6–92.9%) and PPV was 87.3% (95% CI, 79.6–92.9%). For female patients, the sensitivity was 94.5% (95% CI, 88.4–98.0%) and PPV was 88.0% (95% CI, 80.7–93.3%) (p = 0.06 for sensitivity; p = 0.86 for PPV).
Finally, comparing notes with 0–1 contraindication per note to those with 2–5 contraindications revealed no significant differences in performance (online suppl. Table S5). Notes with 0–1 contraindication had a sensitivity of 96.8% (95% CI, 89.0–99.6%) and PPV of 85.9% (95% CI, 75.6–93.0%), while notes with 2–5 contraindications had a sensitivity of 88.5% (95% CI, 82.4–93.0%) and PPV of 88.5% (95% CI, 82.4–93.0%) (p = 0.052 for sensitivity; p = 0.59 for PPV).
Discussion
In this proof-of-concept study, we generated 150 synthetic notes mimicking real clinical notes within patients’ EMR prior to acute stroke evaluations. We also developed NeuroGlimpse, an LLM-based tool equipped with a custom prompt, for identifying stroke thrombolysis contraindications within these synthetic notes. NeuroGlimpse showed a sensitivity of 91% and PPV of 88%, with no evidence of sex bias. These results highlight the potential of this tool for identifying thrombolysis contraindications within real EMR clinical notes. This could be integrated into acute stroke workflows with real-time processing of patients’ most recent EMR clinical notes prior to acute stroke evaluations in order to quickly and accurately identify potential contraindications.
Safe and timely thrombolysis in ischemic stroke is critical, as expedited treatment improves functional outcomes [1, 2], while accurate identification of contraindications minimizes hemorrhagic risk [15‒17]. Clinicians reportedly spend 2 to 7 min reviewing EMR for contraindications, with miss rates ranging from 20% to 43%, though these estimates are based on simulated scenarios rather than real-world data [3, 4]. To enhance safety and reduce door-to-needle times, clinical decision support tools have emerged. The first, developed in 2015, relied on structured EMR data (e.g., diagnoses, medications) but was limited by data inaccuracies compared to unstructured clinical notes [3, 18]. Subsequent tools introduced in 2018 and 2023 incorporated keyword searches within clinical notes to improve information capture [4, 19]. However, these methods required manual review and were hindered by issues such as abbreviation, misspelling, and the inability to interpret contextual meanings in notes.
This proof-of-concept study showed the potential of an LLM to identify thrombolysis contraindications within EMR clinical notes. Future iterations of NeuroGlimpse (online suppl. Fig. S2) will integrate real-time EMR access. Upon activation of a stroke alert, this tool will extract the patient’s unique identifiers from paging information, retrieve the most recent EMR clinical notes prior to acute stroke evaluation for the corresponding patient, and analyze them using Llama 3.1 to generate a list of contraindications within 30 s. This list will be delivered to the clinician through mobile and desktop interfaces. By facilitating rapid and accurate identification of thrombolysis contraindications, this tool could enable faster and safer thrombolysis decision-making, reduce door-to-needle times, lower the risk of post-thrombolysis hemorrhagic complications, and increase the rate of stroke thrombolysis. These potential benefits may be even greater in rural hospitals, where lack of stroke expertise may have contributed to rural hospitals’ slower thrombolysis and lower rates of thrombolysis compared to urban hospitals [20].
NeuroGlimpse’s initial development utilized synthetic clinical notes to enable rapid generation and diverse content coverage, facilitating comprehensive prototype evaluation across varied clinical scenarios. This approach expedited preliminary validation, aligning with the rapid evolution of LLMs. Subsequent phases will incorporate retrospective analyses of real EMR data, followed by prospective randomized trials to assess NeuroGlimpse’s real-time utility in acute stroke care.
This study has several limitations. First, NeuroGlimpse was not evaluated using real patient clinical notes, potentially affecting performance metrics and limiting generalizability. For example, most of the contents within synthetic notes pertained to the past medical history and history of present illness, which may be overrepresented compared to real clinical notes. Future validation will involve a retrospective study with real-world data. Second, the model analyzed single clinical notes rather than sets, which may reduce sensitivity due to overlapping information but could also lower precision due to increased FPs. Third, the study did not demonstrate automated real-time EMR integration, a critical feature for clinical utility. Fourth, the comprehensiveness of outputs was not assessed, potentially impacting usability. Fifth, all synthetic notes were uniformly dated one day prior to contraindication identification, limiting assessment of the model’s ability to process older, less relevant information. Sixth, our current LLM-based tool was designed to only analyze past medical records for a given patient. It would not have the ability to capture live data from the acute stroke evaluation, such as blood pressure and blood glucose obtained at the time of stroke alert, due to technical difficulties in accessing very recent data (less than 24 h) from EMR. Lastly, only one vascular neurologist reviewed synthetic notes to establish the ground truth, introducing potential bias due to the lack of consensus-based validation.
Conclusion
We developed an LLM-based tool that showed high sensitivity and PPV in identifying stroke thrombolysis contraindications within synthetic clinical notes in this proof-of-concept study. This tool could eventually process a patient’s most recent EMR clinical notes in real time in order to inform the clinician of the presence of any thrombolysis contraindication rapidly and accurately, enabling faster and safer thrombolysis decision-making.
Statement of Ethics
This study did not involve real patient data; instead, synthetic clinical notes were utilized to evaluate the performance of the developed tool. As no human participants were involved, the study was exempt from requiring Institutional Review Board (IRB) approval. The generated synthetic data were reviewed internally to ensure alignment with ethical research practices. No identifiable information was included, and the study adhered to the STARD guidelines for reporting diagnostic accuracy studies.
Conflict of Interest Statement
The authors declare no conflicts of interest regarding the publication of this manuscript. The study did not receive any specific funding, and no financial or nonfinancial relationships exist that could be perceived as a potential conflict of interest in the writing of this manuscript.
Funding Sources
No funding was received for this study. The study was conducted independently without any financial support from external sponsors. The authors affirm that the study design, data collection, analysis, manuscript writing, and decision to publish were carried out without any involvement or influence from external funders.
Author Contributions
Conceptualization, methodology, and formal analysis: Bing Yu Chen and Fares Antaki. Investigation, data curation, writing – original draft preparation, visualization, and project administration: Bing Yu Chen. Writing – review and editing: Fares Antaki, Marco Gonzalez Castellon, Ken Uchino, Samer Albahra, Scott Robertson, Sidonie Ibrikji, Eric Aube, Andrew Russman, and M. Shazam Hussain. Supervision: M. Shazam Hussain and Andrew N. Russman. All authors have read and approved the final version of the manuscript.
Data Availability Statement
The data supporting this study consist of synthetic clinical notes generated for research purposes. These synthetic data are not based on real patient information and are available upon reasonable request. Due to their synthetic nature, there are no legal or ethical restrictions on sharing the data. Researchers interested in accessing the data and prompts used in this study may contact the corresponding author, Dr. Bing Yu Chen, at [email protected]. The data that support the findings of this study are not publicly available due to privacy reasons but are available from the corresponding author upon reasonable request.