Introduction: Timely thrombolytic therapy improves outcomes in acute ischemic stroke. Manual chart review to screen for thrombolysis contraindications may be time-consuming and prone to errors. We developed and tested a large language model (LLM)-based tool to identify thrombolysis contraindications from clinical notes using synthetic data in a proof-of-concept study. Methods: We generated 150 synthetic clinical notes containing randomly assigned thrombolysis contraindications using LLMs. We then used Llama 3.1 405B with a custom prompt to generate a list of thrombolysis contraindications from each note. Performance was evaluated using sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), accuracy, and F1 score. Results: A total of 150 synthetic notes were generated using five different models: ChatGPT-4o, Llama 3.1 405B, Llama 3.1 70B, ChatGPT-4o mini, and Gemini 1.5 Flash. On average, each note contained 241.6 words (SD 110.7; range 80–549) and included 1.5 contraindications (SD 1.1; range 0–5). Our tool achieved a sensitivity of 90.9% (95% CI: 86.3%–94.3%), specificity of 99.2% (95% CI: 98.8%–99.5%), PPV of 87.7% (95% CI: 82.7%–91.7%), NPV of 99.4% (95% CI: 99.1%–99.6%), accuracy of 98.7% (95% CI: 98.2%–99.0%), and an F1 score of 0.892. Among the false positives, 24 (86%) were due to the inclusion of irrelevant contraindications, and 4 (14%) resulted from repetitive information. No hallucinations were observed. Conclusion: Our LLM-based tool may identify stroke thrombolysis contraindications from synthetic clinical notes with high sensitivity and PPV. Future studies will validate its performance using real EMR data and integrate it into acute stroke workflows to facilitate faster and safer thrombolysis decision-making.

Timely stroke thrombolysis is associated with better outcomes in eligible patients [1, 2]. A key component of eligibility determination is the manual review of patients’ previous electronic medical records (EMRs) to screen for thrombolysis contraindications. This task may be time-consuming and prone to errors, potentially leading to treatment delays, inappropriate thrombolysis due to missed contraindications, or withholding of thrombolysis due to wrongfully identified contraindications. Previous studies showed that clinicians spent 2–7 min per patient on EMR review and missed 20–43% of thrombolysis contraindications [3, 4]. Recent advancements in large language models (LLM) represent a significant breakthrough in artificial intelligence and machine learning [5]. Trained on extensive textual datasets, LLMs exhibit strong capabilities in natural language processing and general reasoning, with medical applications such as diagnostic reasoning, treatment recommendations, knowledge augmentation, and clinical data summarization [6‒11]. LLMs may offer a novel opportunity to address the gap in acute stroke care related to the timely and accurate identification of thrombolysis contraindications.

This study aimed to develop a proof-of-concept LLM-based tool for the identification of stroke thrombolysis contraindications within clinical notes and test its preliminary performances using synthetic clinical notes. This tool could be used for real-time processing of a patient’s previous EMR clinical notes and to quickly generate a list of thrombolysis contraindications to enhance the accuracy and timeliness of stroke thrombolysis decision-making.

Synthetic clinical notes were generated to evaluate the LLM-based tool’s performance in identifying stroke thrombolysis contraindications (online suppl. Fig. S1; for all online suppl. material, see https://doi.org/10.1159/000545317). As no real patient data were used, IRB approval was unnecessary. Data and prompts are available upon request. This study complies with the Standards for Reporting Diagnostic Accuracy Studies (STARD) guidelines [12].

Synthetic Clinical Notes

In order to mimic typical clinical notes within patients’ EMR prior to acute stroke evaluations (e.g., outpatient visit notes, operative reports, admission notes, progress notes, discharge summaries), we generated 150 synthetic clinical notes using five LLMs under default settings: ChatGPT-4o (OpenAI), Llama 3.1 70B and 405B (Meta), ChatGPT-4o mini (OpenAI), and Gemini 1.5 Flash (Google). The use of five different LLMs served to provide stylistic variety to the synthetic notes. These LLMs were stratified into three tiers based on parameter size: tier 1 (ChatGPT-4o, Llama 3.1 405B), tier 2 (Llama 3.1 70B), and tier 3 (ChatGPT-4o mini, Gemini 1.5 Flash). To emphasize higher quality notes from larger models, tier 1 generated 60 notes, tier 2 produced 50, and tier 3 contributed 40, with generation quotas predetermined prior to study initiation. A standardized prompt and a random number generator were employed to guide each LLM in generating random synthetic notes (online suppl. Table S1). The prompt specified predetermined variables, including patient age (51–100 years), sex, number of contraindications (0–3), and their nature, if applicable, to which the number generator assigned a random value. The encounter date was uniformly set one day before contraindication identification, ensuring consistency across generated notes. Each synthetic note was reviewed by a vascular neurologist to identify potential contraindications from a predefined list of 24 thrombolysis contraindications based on the American Heart Association stroke guidelines [13] (online suppl. Table S1). A contraindication was considered present if the documentation necessitated clarification or confirmation with the patient before thrombolysis administration. Identified contraindications were compared with the predefined set specified in the prompt, accounting for potential discrepancies. A final curated list of contraindications for each note was established as the ground truth for performance evaluation.

Identification of Contraindications

We developed an LLM-based tool, named NeuroGlimpse, for identifying stroke thrombolysis contraindications from clinical notes. This tool was powered by Llama 3.1 405B, an open-source LLM with performance comparable to ChatGPT-4o [14], in its default settings (temperature 0.2, top P 0.7, frequency and presence penalties at 0, token limit of 1,024). This LLM was accessed via an Nvidia cloud platform. A custom prompt, refined iteratively and exceeding 3,000 words, provided detailed instructions, output formats, and thrombolysis-specific clinical knowledge (online suppl. Table S2). The input consisted of this prompt and a synthetic note, and the output was a single list of thrombolysis contraindications, including the reason for contraindication, its classification (absolute or relative), pertinent clinical information, and the date and author of the source note (online suppl. Table S3). No modification to the original LLM or preprocessing of inputs or outputs was needed.

Performance Measures

We evaluated the performance of NeuroGlimpse using the previously established ground truth. Performance evaluation included (1) number of correctly identified contraindications (true positives); (2) number of missed contraindications (false negatives); and (3) number of false positives (FPs). FPs consisted of irrelevant contraindications (items not present in the ground truth), repetitive clinical information (multiple items describing the same contraindication), and hallucinations [5] (factually incorrect or fabricated clinical information). True negatives (TNs) were contraindications not present in a clinical note and not reported by NeuroGlimpse. Given that each clinical note could contain up to 24 contraindications, a note without any contraindications was counted as 24 TNs. The primary performance measures were sensitivity and positive predictive value (PPV). Secondary measures included specificity, negative predictive value (NPV), accuracy, and F1 score.

Statistical Analyses

Sensitivity, specificity, PPV, NPV, accuracy, and F1 score were calculated using standard definitions based on true positive, FP, false negative, and TN. Clopper-Pearson exact methods were employed to calculate 95% confidence intervals (CIs) for sensitivity, specificity, PPV, NPV, and accuracy. Two-sample tests for comparison of binomial proportions were utilized to compare primary performance measures across subgroups defined by sex and the number of contraindications per note. Chi-square tests were applied to evaluate differences in primary performance measures among the 24 contraindications, under the null hypothesis of equal sensitivity and PPV across all contraindications. All statistical tests were two-tailed, and a significance level of α = 0.05 was used throughout. Statistical analyses were performed using Microsoft Excel version 16.90.

Baseline Characteristics

A total of 150 synthetic clinical notes were generated using various LLMs (Table 1). Specifically, 40 notes (26.7%) were produced by ChatGPT-4o, 20 (13.3%) by Llama 3.1 405B, 50 (33.3%) by Llama 3.1 70B, 20 (13.3%) by ChatGPT-4o mini, and 20 (13.3%) by Gemini 1.5 Flash. Each note averaged 241.6 words (SD 110.7; range 80–549) and included an average of 1.5 contraindications (SD 1.1; range 0–5). Among these notes, 89 (59.3%) contained 0–1 contraindications, while 61 (40.7%) included 2–5 contraindications. The synthetic patient cohort had a mean age of 73.3 years (SD 14.7; range 51–100), with 72 patients (48.0%) being female.

Table 1.

Baseline characteristics from 150 synthetic notes

Characteristics
Age, mean (SD), range, years 73.3 (14.7), 51–100 
Female, n (%) 72 (48.0) 
LLM used for note generation, n (%) 
 ChatGPT-4o 40 (26.7) 
 Llama 3.1 405B 20 (13.3) 
 Llama 3.1 70B 50 (33.3) 
 ChatGPT-4o mini 20 (13.3) 
 Gemini 1.5 Flash 20 (13.3) 
Words per note, mean (SD), range, n 241.6 (110.7), 80–549 
Contraindications per note, mean (SD), range, n 1.5 (1.1), 0–5 
 Notes with 0 contraindication, n (%) 26 (17.3) 
 Notes with 1 contraindication, n (%) 63 (42.0) 
 Notes with 2 contraindications, n (%) 35 (23.3) 
 Notes with 3 contraindications, n (%) 20 (13.3) 
 Notes with 4 contraindications, n (%) 4 (2.7) 
 Notes with 5 contraindications, n (%) 2 (1.3) 
Characteristics
Age, mean (SD), range, years 73.3 (14.7), 51–100 
Female, n (%) 72 (48.0) 
LLM used for note generation, n (%) 
 ChatGPT-4o 40 (26.7) 
 Llama 3.1 405B 20 (13.3) 
 Llama 3.1 70B 50 (33.3) 
 ChatGPT-4o mini 20 (13.3) 
 Gemini 1.5 Flash 20 (13.3) 
Words per note, mean (SD), range, n 241.6 (110.7), 80–549 
Contraindications per note, mean (SD), range, n 1.5 (1.1), 0–5 
 Notes with 0 contraindication, n (%) 26 (17.3) 
 Notes with 1 contraindication, n (%) 63 (42.0) 
 Notes with 2 contraindications, n (%) 35 (23.3) 
 Notes with 3 contraindications, n (%) 20 (13.3) 
 Notes with 4 contraindications, n (%) 4 (2.7) 
 Notes with 5 contraindications, n (%) 2 (1.3) 

Thrombolysis Contraindications

Contraindications were randomly assigned during the synthetic note generation process. However, the predetermined set of contraindications specified in the prompt may have differed from manually established ground truth, resulting in an uneven distribution across the notes (Table 2). The most frequently appearing contraindications were major surgery within the prior 14 days (n = 31), gastrointestinal bleeding within the prior 21 days (n = 19), and recent or current use of warfarin (n = 17). The least frequent contraindications included recent or ongoing internal bleeding (n = 2), serious trauma within the prior 14 days (n = 2), gastrointestinal malignancy (n = 3), and intra-axial intracranial neoplasm (n = 3). In total, 219 contraindications were incorporated within these notes.

Table 2.

Thrombolysis contraindications and performance measures of NeuroGlimpse

Contraindication IDDescriptionn (%)TPFPFNSensitivity (%)aPPV (%)a
Recent or ongoing internal bleeding 2 (0.9) 50.0 100.0 
Intracranial or intraspinal surgery within the prior 3 months 12 (5.5) 11 91.7 91.7 
Ischemic stroke within the prior 3 months 9 (4.1) 55.6 100.0 
Significant head trauma within the prior 3 months 8 (3.7) 100.0 100.0 
History of previous unprovoked intracranial hemorrhage 8 (3.7) 100.0 100.0 
History of thrombocytopenia 8 (3.7) 100.0 88.9 
Recent or current use of warfarin 17 (7.8) 17 100.0 100.0 
Recent or current use of IV heparin 5 (2.3) 100.0 100.0 
Use of direct oral anticoagulants within last 48 h 13 (5.9) 12 92.3 100.0 
10 Use of LMWH within last 48 h 5 (2.3) 100.0 71.4 
11 GI malignancy 3 (1.4) 66.7 100.0 
12 GI bleed within the prior 21 days 19 (8.7) 15 78.9 100.0 
13 Intra-axial intracranial neoplasm 3 (1.4) 33.3 100.0 
14 Infective endocarditis 4 (1.8) 75.0 100.0 
15 Aortic arch dissection 5 (2.3) 80.0 100.0 
16 Suspicion of SAH 11 (5.0) 11 100.0 100.0 
17 Major surgery within the prior 14 days 31 (14.2) 28 90.3 77.8 
18 Serious trauma within the prior 14 days 2 (0.9) 100.0 50.0 
19 Arterial puncture at noncompressible site within the prior 7 days 7 (3.2) 85.7 100.0 
20 Intracranial vascular malformation 5 (2.3) 100.0 71.4 
21 Large unruptured aneurysm 10 (4.6) 10 100.0 100.0 
22 Intracranial arterial dissection 6 (2.7) 100.0 100.0 
23 Prior history of greater than 10 cerebral microbleeds 11 (5.0) 11 100.0 91.7 
24 Systemic malignancy with a life expectancy of less than 6 months 15 (6.8) 15 11 100.0 57.7 
Total 219 (100.0) 199 28 20 90.9 87.7 
Contraindication IDDescriptionn (%)TPFPFNSensitivity (%)aPPV (%)a
Recent or ongoing internal bleeding 2 (0.9) 50.0 100.0 
Intracranial or intraspinal surgery within the prior 3 months 12 (5.5) 11 91.7 91.7 
Ischemic stroke within the prior 3 months 9 (4.1) 55.6 100.0 
Significant head trauma within the prior 3 months 8 (3.7) 100.0 100.0 
History of previous unprovoked intracranial hemorrhage 8 (3.7) 100.0 100.0 
History of thrombocytopenia 8 (3.7) 100.0 88.9 
Recent or current use of warfarin 17 (7.8) 17 100.0 100.0 
Recent or current use of IV heparin 5 (2.3) 100.0 100.0 
Use of direct oral anticoagulants within last 48 h 13 (5.9) 12 92.3 100.0 
10 Use of LMWH within last 48 h 5 (2.3) 100.0 71.4 
11 GI malignancy 3 (1.4) 66.7 100.0 
12 GI bleed within the prior 21 days 19 (8.7) 15 78.9 100.0 
13 Intra-axial intracranial neoplasm 3 (1.4) 33.3 100.0 
14 Infective endocarditis 4 (1.8) 75.0 100.0 
15 Aortic arch dissection 5 (2.3) 80.0 100.0 
16 Suspicion of SAH 11 (5.0) 11 100.0 100.0 
17 Major surgery within the prior 14 days 31 (14.2) 28 90.3 77.8 
18 Serious trauma within the prior 14 days 2 (0.9) 100.0 50.0 
19 Arterial puncture at noncompressible site within the prior 7 days 7 (3.2) 85.7 100.0 
20 Intracranial vascular malformation 5 (2.3) 100.0 71.4 
21 Large unruptured aneurysm 10 (4.6) 10 100.0 100.0 
22 Intracranial arterial dissection 6 (2.7) 100.0 100.0 
23 Prior history of greater than 10 cerebral microbleeds 11 (5.0) 11 100.0 91.7 
24 Systemic malignancy with a life expectancy of less than 6 months 15 (6.8) 15 11 100.0 57.7 
Total 219 (100.0) 199 28 20 90.9 87.7 

GI, gastrointestinal; IV, intravenous; LMWH, low-molecular-weight heparin; SAH, subarachnoid hemorrhage; TP, true positive; FN, false negative.

aChi-squared tests were carried out to test whether sensitivity and PPV varied by contraindication, assuming an expectation of equal sensitivity and PPV for all contraindications: sensitivity, p = 1.00; PPV, p < 0.0001.

Overall Performance

When evaluating all 219 contraindications collectively, NeuroGlimpse demonstrated strong performance metrics (Table 3): sensitivity of 90.9% (95% CI, 86.3–94.3%), specificity of 99.2% (95% CI, 98.8–99.5%), PPV of 87.7% (95% CI, 82.7–91.7%), NPV of 99.4% (95% CI, 99.1–99.6%), accuracy of 98.7% (95% CI, 98.2–99.0%), and an F1 score of 0.892. The primary reason for FPs was the identification of irrelevant contraindications (n = 24, 86%), such as interpreting documentation like “no clear evidence of malignancy but advanced age and multiple comorbidities may suggest limited life expectancy” as contraindications. The remaining FPs were due to repetitive clinical information (n = 4, 14%). No instances of hallucinations were observed in any of the NeuroGlimpse outputs (Table 4).

Table 3.

Overall performance of NeuroGlimpse

Performance measures% (95% CI)
Sensitivity 90.9 (86.3–94.3) 
Specificity 99.2 (98.8–99.5) 
PPV 87.7 (82.7–91.7) 
NPV 99.4 (99.1–99.6) 
Accuracy 98.7 (98.2–99.0) 
F1 score 0.892 (−) 
Performance measures% (95% CI)
Sensitivity 90.9 (86.3–94.3) 
Specificity 99.2 (98.8–99.5) 
PPV 87.7 (82.7–91.7) 
NPV 99.4 (99.1–99.6) 
Accuracy 98.7 (98.2–99.0) 
F1 score 0.892 (−) 
Table 4.

Reasons for FP results

Reasonn (%)
Irrelevant contraindication 24 (85.7) 
Repetitive clinical information 4 (14.3) 
Hallucination or factually false information 0 (0.0) 
Total 28 (100.0) 
Reasonn (%)
Irrelevant contraindication 24 (85.7) 
Repetitive clinical information 4 (14.3) 
Hallucination or factually false information 0 (0.0) 
Total 28 (100.0) 

Subgroup Comparisons of Performance

Sensitivity and PPV varied numerically across different contraindications (Table 2). Sensitivity ranged from 33.3% for intra-axial intracranial neoplasm to 100.0% for multiple contraindications, while PPV ranged from 50.0% for serious trauma within the prior 14 days to 100.0% for multiple contraindications. Chi-squared tests, assuming equal sensitivity and PPV for all contraindications, revealed no significant variation in sensitivity (p = 1.00) but a significant variation in PPV (p < 0.0001), with systematic malignancy (n = 11) representing 39.3% of all FPs.

Comparing performance between male and female patients showed no significant differences (online suppl. Table S4). For male patients, the sensitivity was 87.3% (95% CI, 79.6–92.9%) and PPV was 87.3% (95% CI, 79.6–92.9%). For female patients, the sensitivity was 94.5% (95% CI, 88.4–98.0%) and PPV was 88.0% (95% CI, 80.7–93.3%) (p = 0.06 for sensitivity; p = 0.86 for PPV).

Finally, comparing notes with 0–1 contraindication per note to those with 2–5 contraindications revealed no significant differences in performance (online suppl. Table S5). Notes with 0–1 contraindication had a sensitivity of 96.8% (95% CI, 89.0–99.6%) and PPV of 85.9% (95% CI, 75.6–93.0%), while notes with 2–5 contraindications had a sensitivity of 88.5% (95% CI, 82.4–93.0%) and PPV of 88.5% (95% CI, 82.4–93.0%) (p = 0.052 for sensitivity; p = 0.59 for PPV).

In this proof-of-concept study, we generated 150 synthetic notes mimicking real clinical notes within patients’ EMR prior to acute stroke evaluations. We also developed NeuroGlimpse, an LLM-based tool equipped with a custom prompt, for identifying stroke thrombolysis contraindications within these synthetic notes. NeuroGlimpse showed a sensitivity of 91% and PPV of 88%, with no evidence of sex bias. These results highlight the potential of this tool for identifying thrombolysis contraindications within real EMR clinical notes. This could be integrated into acute stroke workflows with real-time processing of patients’ most recent EMR clinical notes prior to acute stroke evaluations in order to quickly and accurately identify potential contraindications.

Safe and timely thrombolysis in ischemic stroke is critical, as expedited treatment improves functional outcomes [1, 2], while accurate identification of contraindications minimizes hemorrhagic risk [15‒17]. Clinicians reportedly spend 2 to 7 min reviewing EMR for contraindications, with miss rates ranging from 20% to 43%, though these estimates are based on simulated scenarios rather than real-world data [3, 4]. To enhance safety and reduce door-to-needle times, clinical decision support tools have emerged. The first, developed in 2015, relied on structured EMR data (e.g., diagnoses, medications) but was limited by data inaccuracies compared to unstructured clinical notes [3, 18]. Subsequent tools introduced in 2018 and 2023 incorporated keyword searches within clinical notes to improve information capture [4, 19]. However, these methods required manual review and were hindered by issues such as abbreviation, misspelling, and the inability to interpret contextual meanings in notes.

This proof-of-concept study showed the potential of an LLM to identify thrombolysis contraindications within EMR clinical notes. Future iterations of NeuroGlimpse (online suppl. Fig. S2) will integrate real-time EMR access. Upon activation of a stroke alert, this tool will extract the patient’s unique identifiers from paging information, retrieve the most recent EMR clinical notes prior to acute stroke evaluation for the corresponding patient, and analyze them using Llama 3.1 to generate a list of contraindications within 30 s. This list will be delivered to the clinician through mobile and desktop interfaces. By facilitating rapid and accurate identification of thrombolysis contraindications, this tool could enable faster and safer thrombolysis decision-making, reduce door-to-needle times, lower the risk of post-thrombolysis hemorrhagic complications, and increase the rate of stroke thrombolysis. These potential benefits may be even greater in rural hospitals, where lack of stroke expertise may have contributed to rural hospitals’ slower thrombolysis and lower rates of thrombolysis compared to urban hospitals [20].

NeuroGlimpse’s initial development utilized synthetic clinical notes to enable rapid generation and diverse content coverage, facilitating comprehensive prototype evaluation across varied clinical scenarios. This approach expedited preliminary validation, aligning with the rapid evolution of LLMs. Subsequent phases will incorporate retrospective analyses of real EMR data, followed by prospective randomized trials to assess NeuroGlimpse’s real-time utility in acute stroke care.

This study has several limitations. First, NeuroGlimpse was not evaluated using real patient clinical notes, potentially affecting performance metrics and limiting generalizability. For example, most of the contents within synthetic notes pertained to the past medical history and history of present illness, which may be overrepresented compared to real clinical notes. Future validation will involve a retrospective study with real-world data. Second, the model analyzed single clinical notes rather than sets, which may reduce sensitivity due to overlapping information but could also lower precision due to increased FPs. Third, the study did not demonstrate automated real-time EMR integration, a critical feature for clinical utility. Fourth, the comprehensiveness of outputs was not assessed, potentially impacting usability. Fifth, all synthetic notes were uniformly dated one day prior to contraindication identification, limiting assessment of the model’s ability to process older, less relevant information. Sixth, our current LLM-based tool was designed to only analyze past medical records for a given patient. It would not have the ability to capture live data from the acute stroke evaluation, such as blood pressure and blood glucose obtained at the time of stroke alert, due to technical difficulties in accessing very recent data (less than 24 h) from EMR. Lastly, only one vascular neurologist reviewed synthetic notes to establish the ground truth, introducing potential bias due to the lack of consensus-based validation.

We developed an LLM-based tool that showed high sensitivity and PPV in identifying stroke thrombolysis contraindications within synthetic clinical notes in this proof-of-concept study. This tool could eventually process a patient’s most recent EMR clinical notes in real time in order to inform the clinician of the presence of any thrombolysis contraindication rapidly and accurately, enabling faster and safer thrombolysis decision-making.

This study did not involve real patient data; instead, synthetic clinical notes were utilized to evaluate the performance of the developed tool. As no human participants were involved, the study was exempt from requiring Institutional Review Board (IRB) approval. The generated synthetic data were reviewed internally to ensure alignment with ethical research practices. No identifiable information was included, and the study adhered to the STARD guidelines for reporting diagnostic accuracy studies.

The authors declare no conflicts of interest regarding the publication of this manuscript. The study did not receive any specific funding, and no financial or nonfinancial relationships exist that could be perceived as a potential conflict of interest in the writing of this manuscript.

No funding was received for this study. The study was conducted independently without any financial support from external sponsors. The authors affirm that the study design, data collection, analysis, manuscript writing, and decision to publish were carried out without any involvement or influence from external funders.

Conceptualization, methodology, and formal analysis: Bing Yu Chen and Fares Antaki. Investigation, data curation, writing – original draft preparation, visualization, and project administration: Bing Yu Chen. Writing – review and editing: Fares Antaki, Marco Gonzalez Castellon, Ken Uchino, Samer Albahra, Scott Robertson, Sidonie Ibrikji, Eric Aube, Andrew Russman, and M. Shazam Hussain. Supervision: M. Shazam Hussain and Andrew N. Russman. All authors have read and approved the final version of the manuscript.

The data supporting this study consist of synthetic clinical notes generated for research purposes. These synthetic data are not based on real patient information and are available upon reasonable request. Due to their synthetic nature, there are no legal or ethical restrictions on sharing the data. Researchers interested in accessing the data and prompts used in this study may contact the corresponding author, Dr. Bing Yu Chen, at [email protected]. The data that support the findings of this study are not publicly available due to privacy reasons but are available from the corresponding author upon reasonable request.

1.
Emberson
J
,
Lees
KR
,
Lyden
P
,
Blackwell
L
,
Albers
G
,
Bluhmki
E
, et al
.
Effect of treatment delay, age, and stroke severity on the effects of intravenous thrombolysis with alteplase for acute ischaemic stroke: a meta-analysis of individual patient data from randomised trials
.
Lancet
.
2014
;
384
(
9958
):
1929
35
.
2.
Man
S
,
Xian
Y
,
Holmes
DN
,
Matsouaka
RA
,
Saver
JL
,
Smith
EE
, et al
.
Association between thrombolytic door-to-needle time and 1-year mortality and readmission in patients with acute ischemic stroke
.
JAMA
.
2020
;
323
(
21
):
2170
84
.
3.
Sun
M-C
,
Chan
J-A
.
A clinical decision support tool to screen health records for contraindications to stroke thrombolysis–a pilot study
.
BMC Med Inform Decis Mak
.
2015
;
15
(
1
):
105
.
4.
Sung
S-F
,
Chen
K
,
Wu
DP
,
Hung
L-C
,
Su
Y-H
,
Hu
Y-H
.
Applying natural language processing techniques to develop a task-specific EMR interface for timely stroke thrombolysis: a feasibility study
.
Int J Med Inform
.
2018
;
112
:
149
57
.
5.
Thirunavukarasu
AJ
,
Ting
DSJ
,
Elangovan
K
,
Gutierrez
L
,
Tan
TF
,
Ting
DSW
.
Large language models in medicine
.
Nat Med
.
2023
;
29
(
8
):
1930
40
.
6.
Chia
MA
,
Antaki
F
,
Zhou
Y
,
Turner
AW
,
Lee
AY
,
Keane
PA
.
Foundation models in ophthalmology
.
Br J Ophthalmol
.
2024
;
108
(
10
):
1341
8
.
7.
Milad
D
,
Antaki
F
,
Milad
J
,
Farah
A
,
Khairy
T
,
Mikhail
D
, et al
.
Assessing the medical reasoning skills of GPT-4 in complex ophthalmology cases
.
Br J Ophthalmol
.
2024
;
108
(
10
):
1398
405
.
8.
Benary
M
,
Wang
XD
,
Schmidt
M
,
Soll
D
,
Hilfenhaus
G
,
Nassir
M
, et al
.
Leveraging large language models for decision support in personalized oncology
.
JAMA Netw Open
.
2023
;
6
(
11
):
e2343689
.
9.
Bedi
S
,
Liu
Y
,
Orr-Ewing
L
,
Dash
D
,
Koyejo
S
,
Callahan
A
, et al
.
Testing and evaluation of health care applications of large language models: a systematic review
.
JAMA
.
2025
;
333
(
4
):
319
28
.
10.
Chen
TC
,
Couldwell
MW
,
Singer
J
,
Singer
A
,
Koduri
L
,
Kaminski
E
, et al
.
Assessing the clinical reasoning of ChatGPT for mechanical thrombectomy in patients with stroke
.
J Neurointerv Surg
.
2024
;
16
(
3
):
253
60
.
11.
Goh
R
,
Cook
B
,
Stretton
B
,
Booth
AE
,
Satheakeerthy
S
,
Howson
S
, et al
.
Large language models can effectively extract stroke and reperfusion audit data from medical free-text discharge summaries
.
J Clin Neurosci
.
2024
;
129
:
110847
.
12.
Bossuyt
PM
,
Reitsma
JB
,
Bruns
DE
,
Gatsonis
CA
,
Glasziou
PP
,
Irwig
L
, et al
.
Stard 2015: an updated list of essential items for reporting diagnostic accuracy studies
.
BMJ
.
2015
;
351
:
h5527
.
13.
Powers
WJ
,
Rabinstein
AA
,
Ackerson
T
,
Adeoye
OM
,
Bambakidis
NC
,
Becker
K
, et al
.
Guidelines for the early management of patients with acute ischemic stroke: 2019 update to the 2018 guidelines for the early management of acute ischemic stroke: a guideline for healthcare professionals from the American Heart Association/American Stroke Association
.
Stroke
.
2019
;
50
(
12
):
e344
418
.
14.
Dubey
A
,
Jauhri
A
,
Pandey
A
,
Kadian
A
,
Al-Dahle
A
,
Letman
A
, et al
.
The Llama 3 herd of models
.
2024
. [cited 2024 Oct 21]. Available from: http://arxiv.org/abs/2407.21783
15.
Tsivgoulis
G
,
Kargiotis
O
,
De Marchis
G
,
Kohrmann
M
,
Sandset
EC
,
Karapanayiotides
T
, et al
.
Off-label use of intravenous thrombolysis for acute ischemic stroke: a critical appraisal of randomized and real-world evidence
.
Ther Adv Neurol Disord
.
2021
;
14
:
1756286421997368
.
16.
Demaerschalk
BM
,
Kleindorfer
DO
,
Adeoye
OM
,
Demchuk
AM
,
Fugate
JE
,
Grotta
JC
, et al
.
Scientific rationale for the inclusion and exclusion criteria for intravenous alteplase in acute ischemic stroke: a statement for healthcare professionals from the American Heart Association/American Stroke Association
.
Stroke
.
2016
;
47
(
2
):
581
641
.
17.
Lopez-Yunez
AM
,
Bruno
A
,
Williams
LS
,
Yilmaz
E
,
Zurrú
C
,
Biller
J
.
Protocol violations in community-based rTPA stroke treatment are associated with symptomatic intracerebral hemorrhage
.
Stroke
.
2001
;
32
(
1
):
12
6
.
18.
Singer
A
,
Kroeker
AL
,
Yakubovich
S
,
Duarte
R
,
Dufault
B
,
Katz
A
.
Data quality in electronic medical records in Manitoba: do problem lists reflect chronic disease as defined by prescriptions?
.
19.
Cutforth
M
,
Watson
H
,
Brown
C
,
Wang
C
,
Thomson
S
,
Fell
D
, et al
.
Acute stroke CDS: automatic retrieval of thrombolysis contraindications from unstructured clinical letters
.
Front Digit Health
.
2023
;
5
:
1186516
.
20.
Man
S
,
Bruckman
D
,
Uchino
K
,
Chen
BY
,
Dalton
JE
,
Fonarow
GC
.
Rural hospital performance in guideline-recommended ischemic stroke thrombolysis, secondary prevention, and outcomes
.
Stroke
.
2024
;
55
(
10
):
2472
81
.