Abstract
Introduction: Studying what older adults say can provide important insights into cognitive, affective, and social aspects of aging. Available language analysis tools generally require audio-recorded speech to be transcribed into verbatim text, a task that has historically been performed by humans. However, recent advances in AI-based language processing open up the possibility of replacing this time- and resource-intensive task with fully automatic speech to text. Methods: This study evaluates the accuracy of two common automatic speech-to-text tools – OpenAI’s Whisper and otter.ai – relative to human-corrected transcripts. Based on two speech tasks completed by 238 older adults, we used the Linguistic Inquiry and Word Count (LIWC) to compare language features of text generated by each transcription method. The study further assessed the degree to which manual tagging of filler words (e.g., “like,” “well”) common in spoken language impacts the validity of the analysis. Results: The AI-based LIWC features evidenced very high convergence with the LIWC features derived from the human-corrected transcripts (average r = 0.98). Further, the manual tagging of filler words did not impact the validity for all LIWC features except the categories filler words and netspeak. Conclusion: These findings support that Whisper and otter.ai are valuable tools for language analysis in aging research and provide further evidence that automatic speech to text with state-of-the art AI tools is ready for psychological language research.