Abstract
Introduction: Generative artificial intelligence (AI) technologies like GPT-4 can instantaneously provide health information to patients; however, the readability of these outputs compared to ophthalmologist-written responses is unknown. This study aimed to evaluate the readability of GPT-4-generated and ophthalmologist-written responses to patient queries about ophthalmic surgery. Methods: This retrospective cross-sectional study used 200 randomly selected patient questions about ophthalmic surgery extracted from the American Academy of Ophthalmology’s EyeSmart platform. The questions were inputted into GPT-4, and the generated responses were recorded. Ophthalmologist-written replies to the same questions were compiled for comparison. Readability of GPT-4 and ophthalmologist responses was assessed using six validated metrics: Flesch Kincaid Reading Ease (FK-RE), Flesch Kincaid Grade Level (FK-GL), Gunning Fog Score (GFS), SMOG Index (SI), Coleman Liau Index (CLI), and Automated Readability Index (ARI). Descriptive statistics, one-way ANOVA, Shapiro-Wilk, and Levene’s tests (α = 0.05) were used to compare readability between the two groups. Results: GPT-4 used a higher percentage of complex words (24.42%) compared to ophthalmologists (17.76%), although mean (standard deviation) word count per sentence was similar (18.43 [2.95] and 18.01 [6.09]). Across all metrics (FK-RE; FK-GL; GFS; SI; CLI; and ARI), GPT-4 responses were at a higher grade level (34.39 [8.51]; 13.19 [2.63]; 16.37 [2.04]; 12.18 [1.43]; 15.72 [1.40]; 12.99 [1.86]) than ophthalmologists’ responses (50.61 [15.53]; 10.71 [2.99]; 14.13 [3.55]; 10.07 [2.46]; 12.64 [2.93]; 10.40 [3.61]), with both sources necessitating a 12th-grade education for comprehension. ANOVA tests showed significance (p < 0.05) for all comparisons except word count (p = 0.438). Conclusion: The National Institutes of Health advises health information to be written at a 6th- to 7th-grade level. Both GPT-4- and ophthalmologist-written answers exceeded this recommendation, with GPT-4 showing a greater gap. Information accessibility is vital when designing patient resources, particularly with the rise of AI as an educational tool.
Plain Language Summary
As technology continues to evolve, it is important to understand how it can help but also harm certain groups of people. The presence of artificial intelligence (AI) in medicine is rapidly growing. Specifically within ophthalmology, recent studies have shown that AI like ChatGPT-4 is capable of answering patient questions in similar quality levels to human ophthalmologists. How readable this information output by AI is to the average patient seeking these data is important to understanding its accessibility. Therefore, in this study we aimed to compare the readability level of responses from AI and human ophthalmologists to inform future discussions on information accessibility. Our results showed that a higher education level is needed to understand responses from ChatGPT-4 (14th grade) compared to ophthalmologists (12th grade). This is much higher than the average reading level of adults in the USA (8th). Additionally, both required an education level significantly higher than the readability recommendation for medical information from national health organizations (6–7th grade). Our findings emphasize the importance of ensuring information accessibility in the development and usage of technology in the medical space. Failing to prioritize these factors could worsen healthcare disparities for patient populations already at a disadvantage in society.