Abstract
Objective: Speech sound errors are common in people with a variety of communication disorders and can result in impaired message transmission to listeners. Valid and reliable metrics exist to quantify this problem, but they are rarely used in clinical settings due to the time-intensive nature of speech transcription by humans. Automated speech recognition (ASR) technologies have advanced substantially in recent years, enabling them to serve as realistic proxies for human listeners. This study aimed to determine how closely transcription scores from human listeners correspond to scores from an ASR system. Patients and Methods: Sentence recordings from 10 stroke survivors with aphasia and apraxia of speech were transcribed orthographically by 3 listeners and a web-based ASR service. Adjusted transcription scores were calculated for all samples based on accuracy of transcribed content words. Results: As expected, transcription scores were significantly higher for the humans than for ASR. However, intraclass correlations revealed excellent agreement among the humans and ASR systems, and the systematically lower scores for computer speech recognition were effectively equalized simply by adding the regression intercept. Conclusions: The results suggest the clinical feasibility of supplementing or substituting human transcriptions with computer-generated scores, though extension to other speech disorders requires further research.