Abstract
Introduction: This study examined the utility of multiple second formant (F2) slope metrics to capture differences in speech production for individuals with dysarthria and healthy controls as a function of speaking rate. In addition, the utility of F2 slope metrics for predicting severity of intelligibility impairment in dysarthria was examined. Methods: Twenty three speakers with Parkinson’s disease and mild to moderate hypokinetic dysarthria (HD), 9 speakers with various neurological diseases and mild to severe ataxic or ataxic-spastic dysarthria (AD), and 26 age-matched healthy control speakers (CON) participated in a sentence repetition task. Sentences were produced at habitual, fast, and slow speaking rate. A variety of metrics were derived from the rising F2 transition portion of the diphthong /ai/. To obtain measures of intelligibility for the two clinical speaker groups, 15 undergraduate SLP students participated in a transcription experiment. Results: Significantly shallower slopes were found for the speakers with HD compared to control speakers. Steeper F2 slopes were associated with increased speaking rate for all groups. Higher variability in F2 slope metrics was found for the speakers with AD compared to the two other speaker groups. For both clinical speaker groups, there was a negative association between intelligibility and F2 slope variability metrics, indicating lower variability in speech production was associated with higher intelligibility. Discussion: F2 slope metrics were sensitive to dysarthria presence, dysarthria type, and speaking rate. The current study provided evidence that the use of F2 slope variability measures has additional value to F2 slope averaged measures for predicting severity of intelligibility impairment in dysarthria.
Introduction
Dysarthria is a communication disorder involving difficulty in the execution of motor movements underlying speech, resulting from nervous system lesions. Prominent acoustic features of dysarthria include reduced vocal intensity and fundamental frequency variation, reduced vowel segmental contrast, and deviant speaking rate [1‒5]. These and other deviant acoustic characteristics combine in complex ways to result in reduced speech intelligibility. Impairment in segmental articulation has been shown to predominantly contribute to reduced intelligibility in dysarthria [6, 7]. An acoustic measure directly related to articulatory behaviour is the slope of second formant (F2) transitions in diphthongs, reflecting the rate of change of the vocal tract shape [8].
A variety of studies have examined F2 transition outcome measures in dysarthria. Compared to healthy controls, speakers with Parkinson’s disease (PD; [8‒11]), multiple sclerosis (MS; [12, 13]), amyotrophic lateral sclerosis [5, 14‒16], and cerebral palsy [7, 17] all show reduced global F2 slopes, with global slope being quantified as slope transition extent divided by slope transition duration. Shallower or reduced F2 slopes reflect an overall slowness in change between articulatory configurations. In addition, a number of studies have reported shallower F2 slopes to be associated with decreased intelligibility, indicating that F2 slope measures may serve as an acoustic representation of intelligibility or overall dysarthria severity [7, 9, 18, 19]. In addition to global F2 slope metrics, an extreme or maximum instantaneous F2 slope measure (the highest rate of transition change across the slope) for characterizing articulatory behaviour in dysarthria has been investigated [9]. Previous research indicated that maximum F2 slope measures were more sensitive to mild dysarthria than global slope measures in speakers with MS [20]. In contrast to these studies, Tjaden et al. [9] (2013) reported that unlike global slope measures, maximum F2 slope measures did not distinguish speakers with PD from neurotypical speakers. Such contradictory findings call for additional studies before drawing conclusions regarding the suitability of maximum F2 slopes in characterizing the production deficits in dysarthrias associated with different underlying etiologies.
Acoustic measures of F2 transitions usually include transition duration, transition extent, and slope metrics averaged across three to five tokens or repetitions [8, 18]. However, thus far no studies have considered the variability of F2 transition characteristics over a series of repeated productions. A growing body of research on measures of speech variability looks to assess how dysarthria impacts repetitive motor speech behaviour and how manipulating speech parameters such as rate and loudness, may impact motor speech performance across repeated utterances [21, 22]. Findings from this line of research have the potential to provide insight into clinical management of the affected speech production system by identifying and fostering conditions that optimize stability of the speech motor system [23, 24]. By investigating the consistency (and the inverse, variability) of kinematic or acoustic outcome measures from repeated speech tokens (syllables, words, sentences), deviant speech motor control patterns may be revealed. During repetitions of a target utterance, neurotypical adults generally produce consistent speech movement patterns when speaking at a habitual rate but become more variable when instructed to change speaking rate or loudness [25, 26]. Findings of higher utterance-to-utterance variation in kinematics (e.g., lower lip movements) or acoustics (e.g., intensity or fundamental frequency) for speakers with dysarthria are generally thought to reflect larger movement variability as a feature of disordered speech motor control as speakers are using multiple solutions to reach task goals. In other words, they may be using more of the available movement space in a more inconsistent manner [21, 23, 27]. Increased utterance-to-utterance variability is thought to reflect a compromised motor speech system [23, 28]. Indices that quantify utterance-to-utterance variability such as the spatiotemporal index [28], functional data analysis [27], and dynamic time warping (DTW) [21] derived from kinematic and acoustic measures have been found to be valid indicators of speech motor integrity; sensitive to the presence, type, and severity of dysarthria, as well as to speaking conditions in which rate and loudness have been manipulated [21, 22, 27, 29, 30]. Likewise, measures of utterance-to-utterance variability in F2 slope might further provide insights into the nature of speech motor involvement in dysarthria. As such, these variability measures deserve additional attention beyond the averaged F2 slope metrics reported in prior studies.
Previous research has shown that altering speaking rate affects speech motor control and might invoke changes in variability across productions. Research on the effects of rate changes on speech motor control has found that slowing down speech rate might decrease the stability of speech movements. For example, Kleinow et al. [28] and McHenry [23] analysed sentence-length variability by means of the spatiotemporal index (STI) of lower lip movements under different rate conditions in speakers with hypokinetic dysarthria (HD). Findings indicated that deviations from habitual rate were associated with increased STI values, indicating higher variability, even in speakers with mild dysarthria [23]. Findings of higher speech production variability have been extended to ataxic and ataxic-spastic dysarthria, where utterance-to-utterance spectro-temporal variation during sentence repetitions was found to be higher compared to control speakers [21]. These findings indicate that voluntary rate changes might mediate additional instability in speech production beyond those typically associated with habitually produced dysarthric speech.
Against the background of existing literature, the overall aim of the current study was to assess whether metrics capturing utterance-to-utterance variability for F2 slopes showed similar or better sensitivity compared to traditional, averaged global, and maximum F2 slope metrics in differentiating dysarthria from neurotypical speech as well as differentiating among perceptually defined subtypes of dysarthria (c.f. [11, 31, 32]). We also used rate reduction and rate increase as mechanisms to elicit and compare F2 slope variability and average measures across a range of habitually and non-habitually produced speech rates. Modifying speech rate has been documented as an important treatment option for dysarthria in improving intelligibility [33]. Experimentally, rate manipulations may serve as a means to evaluate an individual’s ability to employ adaptive sensorimotor techniques in order to achieve speech output that is both intelligible and stable in motor control [28]. As such, we were particularly interested as to whether measures of F2 slope variability would reveal aspects of rate-related speech motor control deficits in dysarthria beyond traditional averaged F2 measures.
Given that faster articulation rates are associated with steeper slopes [15], DTW distance was employed as additional measure to assess token-to-token variability of the overall shape of the F2 slopes, while discounting the effects of intrinsic rate-dependent slope steepness characteristics. DTW is a class of algorithms for comparing a series of numerical values. In speech research, DTW is used for measuring the similarity between two utterances, expressed as time series, which may vary in time and/or space. These two utterances are stretched or compressed locally in order to make one resemble the other as much as possible, i.e., the distance between the two is minimized. The distance between the two utterances is computed, after stretching, by summing the distances of individual aligned elements [34]. DTW has been previously successfully applied in a small number of studies to quantify variability of repeated utterances in dysarthric speech [21, 35, 36]. It was also of interest to investigate the strength of the association between averaged and variability measures of F2 slope metrices and intelligibility measures in speakers with dysarthria. Findings of such measures showing sensitivity to dysarthria severity might potentially inform future clinical research.
Methods
Speakers
Audio recordings were obtained for three speaker groups. The first group comprised 23 speakers with PD and mild to moderate HD. This group consisted of eighteen males and five females ranging in age from 40 to 81 years (mean = 66.8 years, SD = 10.6 years). The second group comprised nine speakers with various neurological diseases and mild to severe ataxic or ataxic-spastic dysarthria (AD). This group consisted of six males and three females ranging in age from 37 to 70 years (mean = 49.0 years, SD = 11.8 years). The third group comprised 26 healthy speakers who served as age-matched control speakers (CON) and included fifteen males and eleven females (mean = 57.1 years, SD = 14.1 years). All participants were speakers of Scottish English. The presence of dysarthria was established by participant’s speech and language pathologist using the Darley, Anderson, and Brown (DAB) perceptual criteria [32]. The presence and type of dysarthria was confirmed by the second author and a speech pathologist in the Division of Speech and Language Therapy independent of the study. Extended biographical information for speakers with dysarthria is displayed in Table 1. The ability to fulfil the demands of the study was informally assessed by the speakers’ hearing, vision, and reading abilities, and a short medical history to exclude any previous speech and language problems or other health issues that could have affected task performance. All participants were capable of communicating independently with the experimenter in order to discuss the purpose of the study, to give consent, to follow instructions, and be able to repeat a series of sentences and engage in a battery of reading and conversation tasks.
Group . | Gender . | Age, years (SD; range) . | Aetiology . |
---|---|---|---|
Speakers with HD | 18 M, 5 F | 66.8 (10.6; 40–81) | IPD |
Speakers with AD | 6 M, 3 F | 49.0 (11.8; 37–70) | |
AD01 | M | 40 | CA |
AD02 | F | 38 | FA |
AD03 | F | 63 | MS |
AD04 | M | 58 | SCA-8 |
AD05 | F | 44 | MS |
AD06 | M | 70 | MS |
AD07 | M | 37 | MS |
AD08 | M | 46 | MS |
AD09 | M | 45 | CA |
Control speakers | 15 M, 11 F | 57.1 (14.1; 35–80) |
Group . | Gender . | Age, years (SD; range) . | Aetiology . |
---|---|---|---|
Speakers with HD | 18 M, 5 F | 66.8 (10.6; 40–81) | IPD |
Speakers with AD | 6 M, 3 F | 49.0 (11.8; 37–70) | |
AD01 | M | 40 | CA |
AD02 | F | 38 | FA |
AD03 | F | 63 | MS |
AD04 | M | 58 | SCA-8 |
AD05 | F | 44 | MS |
AD06 | M | 70 | MS |
AD07 | M | 37 | MS |
AD08 | M | 46 | MS |
AD09 | M | 45 | CA |
Control speakers | 15 M, 11 F | 57.1 (14.1; 35–80) |
HD, hypokinetic dysarthria; AD, ataxic or ataxic-spastic dysarthria; M, male; F, female; CA, cerebellar ataxia; FA, Friedreich’s ataxia; MS, multiple sclerosis; SCA-8, spinocerebellar ataxia type 8; IPD, idiopathic Parkinson’s Disease.
Recording Procedure
Speech data were collected during a single recording session. Participants were recorded in a quiet environment in their home, at the university campus, or at the local speech clinic or building where they attended support group meetings. Audio recordings were obtained using a wave recorder (Edirol R-09HR) connected to a head-mounted condenser microphone (AKG C420) using a sampling rate of 44.1 kHz at 16 bits. The head-mounted device allowed for a constant distance of 4 cm between the speaker’s mouth and microphone.
Speech Production Tasks
Participants were asked to produce approximately 20 repetitions of the sentence Tony knew you were lying in bed. This phrase was specifically developed to only contain sonorants (between Tony’s /t/ and bed’s /b/) in order to facilitate sentence-length acoustic measures of intensity, fundamental frequency, and formants. The rising F2 transition portion of the diphthong /aɪ/ in “lying” was of interest for use in computing the various F2 slope metrics [37]. Sentences were repeated at habitual, fast, and slow speaking rate. For sentences produced at habitual rate, speakers were instructed to repeat the sentence at their self-chosen comfortable speaking rate and normal loudness. For the fast rate condition, participants were instructed to repeat the sentence twice as fast as their habitual speaking rate. For the slow condition, participants were instructed to repeat the sentence half as fast as their habitual rate. Participants were also instructed to try to avoid pauses, but instead “stretch the words” as much as possible and produce each sentence on a single breath (e.g., [22, 23]). The sentence repetition task at different rates was practised before recording started using a different practice sentence. For each participant, data collection started with the habitually produced sentence followed by the two other speaking conditions administered in random order. Only fluently produced sentences were considered, i.e., sentences without long breaks or pauses, without mid-sentence inhalations, or other major disruptions. For nine sentence repetition tasks spread across different participants, the minimum number of 20 repetitions was not met due to major disfluencies within sentences or when speakers were not able to complete 20 repetitions without taking a long break. The minimum number of repetitions produced by each speaker in each condition was 15 (mean 21.3, SD 1.45).
Each speaker also was audio recorded reading a series of 10 unpredictable sentences adopted from the study of McHenry and Parle [38] for use in intelligibility testing. The use of sentences with unpredictable content ensured that predictable elements of contextual information that facilitate comprehensibility (i.e., syntactic, semantic, and pragmatic cues) were eliminated. Each speaker read a random selection of 10 sentences drawn from the larger pool of 50 sentences from the study of McHenry and Parle [38] at a comfortable reading rate and loudness. Each sentence was 7 words of length [38].
Sentence Durations
To document speaking rate change, sentence durations were obtained from each sentence repetition using Speech Filing System [39]. Sentence duration was operationally defined as the time interval between the marking of the voicing onset of /o/ in “Tony” and the voicing onset of /b/ in “bed. To examine intrarater reliability of sentence marking, a random selection of approximately 10% of the sentence repetition series were selected for reanalysis. The absolute differences in sentence durations were used as outcome measures of interest. An intraclass correlation coefficient (ICC) model was used to calculate intrarater reliability using the R package irr v0.84.1 [40]. Intrarater reliability was assessed by a single measure, absolute agreement-based, two-way random-effects model. Reliability was high: ICC(2,1) = 0.99, 95% CI (0.99–0.99). The mean absolute difference in sentence duration was 21.7 ms (SD 9.7 ms).
F2 Slopes
F2 data were obtained using Praat [41]. Measurement points were obtained at 5 ms, with tracking errors manually corrected. Instantaneous slope values were used to operationally define the onset and offset of the F2 slope. Instantaneous slope was calculated as the change in frequency (Hz) divided by the change in time (ms) between each data point. The onset of the F2 slope was identified as the first point where the slope exceeded a rise of 20 Hz over a time frame of 20 ms. The offset of the F2 slope was identified at the first point where the slope failed to rise 20 Hz over a time frame of 20 ms (c.f. [9, 37]). Slopes with a minimum duration of 40 ms were included for analysis. Slope transition duration was calculated as the time between F2 slope onset and offset. Slope transition extent was calculated as frequency change between F2 slope transition onset and offset.
Five F2 slope outcome measures were of interest. Metrics capturing averaged behaviour in the production of F2 slopes included mean global slope and maximum instantaneous slope (c.f. [9]). Measures capturing token-to-token variability of F2 slope included the coefficient of variation of global slope, the coefficient of variation of maximum instantaneous slope, and DTW metrics. The definition and calculation of measures of interest are specified in greater detail below.
Global slope was quantified as slope transition extent divided by slope transition duration and was the first outcome measure of interest (c.f. [8, 15]). Maximum instantaneous slope was identified as the point of steepest increase across the F2 slope and was the second outcome measure of interest (c.f. [9]). In order to capture averaged speech behaviour, the means of global slope and maximum instantaneous slope (both in Hz/ms) values were obtained. Averaged metrics were obtained from each speaker by calculating the mean across the repetitions for each speaking rate condition.
Variability of speech behaviour was assessed by obtaining coefficients of variation across repeated productions. The coefficient of variation was calculated by dividing the standard deviation by the mean across the repetitions for global slope (in %) and maximum instantaneous slope (both in %). A DTW algorithm was applied to capture repetition-to-repetition variability of the overall shape of the F2 transitions, using the R package dtw v1.23 [34]. DTW was used to align the F2 trajectories and to calculate average repetition-to-repetition distance over the F2 slopes. The repetition-to-repetition distance reflects minimum global dissimilarity, a value that can be regarded as an optimal, stretch-insensitive measure of the inherent difference between two subsequent F2 slopes. Thus, the outcome measure reflects the distance between two F2 slopes in a two-dimensional space of time and amplitude. An example of DTW of two consecutive F2 trajectories is displayed in Figure 1. Panel a displays the F2 slope trajectory of the reference repetition (with a total duration of 130 ms) on the x axis and F2 frequency (from 1,429 Hz to 2,321 Hz) on the y axis. Panel b displays the (rotated) F2 slope trajectory of the consecutive query repetition, with a total length of 190 ms and formant frequencies ranging from 1,493 Hz to 2,355 Hz. Panel c displays the warping curve showing information on the optimal point-by-point matching between the reference and consecutive F2 slope in discrete instances of 5 ms. The DTW distance outcome measure is the summed alignment distance between matched features as indicated by dotted lines. A larger summed distance indicates a larger discrepancy between two consecutive repetitions and represents a larger repetition-to-repetition variability in F2 slopes. A time-normalized distance measure was also of interest to account for inherent differences in the duration of F2 time histories [34]. The time-normalized distance is calculated by dividing the sum of distances along the warping curve by the total number of elements in the path, i.e., the summed length of both F2 slope trajectories. This measure provides normalization to avoid biases based on differences in F2 time history duration. When applied to the current data, the effects of large variations in duration of consecutive slopes will be diminished. We refer to this measure as DTW time-normalized distance (DTWtnd).
To examine interrater reliability of F2 slope assignment, a random selection of approximately 10% of the sentence repetition series were selected for reanalysis by a measurer who did not perform the original acoustic analysis. The absolute differences in slope transition duration and slope transition extent were used as outcome measures of interest. Interrater reliability was assessed by an average-score absolute agreement-based two-way random-effects model. For slope transition duration, reliability was high ICC(2,k) = 0.93, 95% CI (0.83–0.97). For slope transition extent, reliability also was high: ICC(2,k) = 0.94, 95% CI (0.85–0.98). The average absolute difference between raters was 11.7 ms (SD 11.9 ms) for slope transition duration and 67.6 Hz (SD 54.2 Hz) for slope transition extent.
Listeners and Perceptual Methodology
Fifteen undergraduate SLP student listeners (13 females, two males) orthographically transcribed sentences read by the speakers with dysarthria. Listeners ranged in age from 22 to 29 years (M = 22.7 years, SD = 3.0 years). All listeners were native speakers of Scottish English and denied a history of hearing, speech, or language problems. Listeners had completed coursework in motor speech disorders and reported no more than minimal exposure to dysarthria in everyday life. Listeners were seated in a quiet room and used a laptop and over-ear headphones. The transcription task was designed and executed in Praat. All stimuli were converted to an average intensity of 75 dB SPL to ensure a uniform and comfortable loudness level using Praat. Each of the 32 speakers with dysarthria produced ten unpredictable sentences, yielding a total of 320 sentences. Each of the fifteen listeners were presented with 70 randomly selected sentences from the pool of 320 sentences, ensuring that each stimulus was transcribed by at least three listeners. The experiment was self-paced. Listeners heard each sentence only once and were instructed to orthographically transcribe the sentences. A word was identified as being correct when there was an exact phonemic match to the corresponding word in the target utterance. In cases where orthographic errors resulted in a lexical item distinct from the target word, the response was scored as incorrect. For each speaker, the total number of correctly identified words was divided by the total number of words possible and multiplied by 100 to yield the percentage of words identified correctly [42, 43].
Statistical Analysis
Statistical analyses were carried out separately for each acoustic parameter using R software [44]. To analyse effects of group and rate for each of the slope outcome measures, linear mixed model analyses were performed with rate (habitual, fast, and slow) and group (CON, HD, AD) as fixed factors. Speaker was included as random factor. The factor sentence duration was included as covariate in all models to account for intrinsic differences in speaking rate. Significant differences between speaker groups were further explored with Tukey-corrected post hoc comparisons of estimated marginal means, using Satterthwaite’s degree-of-freedom method [45], with a specific view to establish rate conditions driving group differences. Standardized effect sizes, expressed as Cohen’s d, were derived from the estimated marginal means and population standard deviations. The relationship between slope outcome measures produced at habitual rate and intelligibility scores was examined with Pearson product-moment correlations. A significance level of 0.05 was used for all hypothesis testing. To facilitate interpretation of the findings, results are reported by outcome measure, i.e., first the sentence duration results, followed by the results of the individual measures capturing F2 slope behaviour.
Results
Sentence Durations
Averaged sentence durations across groups and rate conditions are plotted in Figure 2. The analysis of sentence durations indicated a main effect of group F(2, 55) = 4.03, p = 0.002. Sentence durations were longer in the AD group compared to the HD group (mean difference 0.63 s, t[55] = 2.82, p = 0.018). A main effect of Rate also was present F(2, 110) = 44.1, p < 0.001. Across groups, sentence durations were longer in the slow rate compared to the fast (mean difference 1.16 s, t[110] = 9.09, p < 0.001) and habitual rate conditions (mean difference 0.84 s, t[110] = 6.59, p < 0.001). In addition, sentence durations were longer in the habitual compared to the fast rate (mean difference 0.32 s, t[110] = 2.50, p = 0.037). The interaction of group × rate also was significant F(4,110) = 3.58, p = 0.009. Post hoc pairwise comparisons indicated that sentence durations in the AD group were longer under the slow rate condition compared to the habitual (mean difference 0.89 s, t[110] = 3.07, p = 0.008) and the Ffast rate condition (mean difference 1.28 s, t[110] = 4.43, p < 0.001). For the HD group, sentence durations were longer under the slow rate condition compared to the habitual (mean difference 0.43 s, t[110] = 2.38, p = 0.049) and the fast rate condition (mean difference 0.67 s, t[110] = 3.71, p = 0.001). Similar to the two clinical groups, sentence durations for the CON group were longer under the slow rate condition compared to the habitual (mean difference 1.19 s, t[110] = 7.02, p < 0.001) and the fast rate condition (mean difference 1.51 s, t[10] = 8.89, p < 0.001). For all individual groups, there was no significant difference in average sentence duration for the habitual and fast rate conditions.
Global Slope
Averaged and variability measures of global slope across groups and rate conditions are plotted in Figure 3. The results of the statistical analysis of the mean global slope measures (GSmean) indicated a main effect of group F(2, 55.2) = 5.49, p = 0.007. GSmean was higher in the CON group compared to the HD group (mean difference 1.36 Hz/ms, t[54.7] = 3.30, p = 0.005, d = 1.60). Post hoc analyses indicated these effects were largely carried by the habitual and fast rate conditions. The AD group was not significantly different from the HD and CON groups. A main effect of rate also was present F(2, 114.4) = 23.4, p < 0.001. GSmean was higher in the fast rate compared to the habitual (mean difference 0.70 Hz/ms, t[110] = 3.87, p < 0.001, d = 0.82) and slow rate (mean difference 1.54 Hz/ms, t[120] = 6.83, p < 0.001, d = 1.82). GSmean was also higher in habitual rate compared to slow rate (mean difference 0.85 Hz/ms, t[116] = 4.16, p = 0.001, d = 1.00). The interaction of group × rate was not significant.
Global slope coefficient of variation (GScov) measures showed a significant effect of group F(2, 56.7) = 10.8, p < 0.001. GScov was higher in the AD group compared to the CON group (mean difference 4.33%, t[56.4] = 4.61, p < 0.001, d = 1.41) and the HD group (mean difference 3.70%, t[58.6] = 3.82, p < 0.001, d = 1.20). Post hoc analyses showed that the difference between speakers with AD and control speakers held in all three rate conditions. The difference between speakers with AD and HD was evident under the slow rate condition. No significant difference was found for the CON and HD groups. The main effect of rate and the interaction of group by rate was not significant.
Maximum Instantaneous Slope
Averaged and variability measures of maximum instantaneous slope across groups and rate conditions are plotted in Figure 4. The results of the mean maximum instantaneous slope (MISmean) measures showed a significant effect of group F(2, 55.4) = 4.22, p = 0.020. MISmean was higher in the CON group compared to the HD group (mean difference 1.83 Hz/ms, t[55.0] = 2.90, p = 0.015, d = 1.42). Post hoc analysis indicating these group differences were largely driven by trends under the habitual and fast rate conditions. The other between-group comparisons were not significant. The main effect of rate was also significant: F(2, 114.5) = 16.1 p < 0.001. MISmean was higher during Fast rate compared to Habitual (mean difference 0.78 Hz/ms, t[110] = 2.86, p = 0.014, d = 0.61) and Slow rate (mean difference 1.95 Hz/ms, t[120] = 5.67, p < 0.001, d = 1.51). MISmean was also higher during habitual rate compared to slow rate (mean difference 1.17 Hz/ms, t[116] = 3.77, p < 0.001, d = 0.91). The interaction effect of group by rate was not significant.
For the maximum instantaneous slope coefficient of variation (MIScov), the effect of group was significant: F(2, 56.2) = 5.23, p = 0.008. MIScov was higher in the AD group compared to the CON group (mean difference 3.9%, t[56.0] = 3.22, p = 0.006, d = 1.17). The other between-group comparisons were not significantly different. The main effect of rate and the interaction effect of group by rate were also not significant.
Repetition-to-Repetition DTW Distance
Averaged DTW measures across groups and rate conditions are plotted in Figure 5. The results of the statistical analysis of the repetition-to-repetition DTW distance measures (DTWd) indicated a significant effect of group F(2, 54.4) = 3.45, p = 0.039. DTWd was higher in the AD group compared to the CON group (mean difference 162 AU, t[53.8] = 2.59, p = 0.032, d = 1.02). Post hoc analysis indicated this was driven by differences under the slow rate condition only. The other between-group comparisons were not significant. The effect of rate was also significant F(2, 118.3) = 6.06, p = 0.003. Compared to fast rate, DTWd was higher during habitual rate (mean difference 90.8 AU, t[110] = 2.60, p = 0.028, d = 0.57) and slow rate (mean difference 137.3 AU, t[130] = 3.33, p = 0.003, d = 0.86). The group by rate interaction effect was not significant.
For the DTWtnd measures, there was a significant effect of group F(2, 54.7) = 3.65, p = 0.032. DTWtnd was higher in the AD group compared to the HD group (mean difference 4.00 AU, t[55.4] = 2.63, p = 0.030, d = 1.30). Post hoc analysis indicated that no particular individual rate condition carried this effect. The other between-group comparisons were not significantly different. The effect of rate was also significant: F(2, 116.5) = 5.04, p = 0.008. DTWtnd was higher in fast compared to slow rate (mean difference 2.58 AU, t[126] = 3.17, p = 0.005, d = 0.84). The interaction effect of group by rate was not significant.
In summary, for all averaged slope measures, a shallower slope was found for the speakers of the HD group compared to the control speakers. Generally, average slopes increased in steepness with increased speaking rate. With respect to the metrics capturing variability, results indicated overall higher variability in the group of speakers with AD, compared to the speakers with HD and/or control speakers. Variability metrics for the speakers with HD and the control speakers were not significantly different. The absence of group by rate interaction effects for all outcome measures indicated that rate-related changes in F2 slope metrics were similar across groups.
Correlations between Slope Measures and Transcription Intelligibility
The results of the Pearson correlations between the slope measures produced at habitual rate, and transcription intelligibility measures are displayed in Table 2. Pooled over the two clinical speaker groups, significant correlations were found for the slope measures capturing variability: coefficient of variation of global slope, maximum instantaneous slope, and DTWd measures. No significant correlations were found involving speakers of the AD group. There was a significant correlation between transcription intelligibility and the coefficient of variation of global slope for the speakers of the HD group. All correlations were negative, indicating an increase in speech acoustic variability to be associated with a decrease in intelligibility.
Group . | GSmean . | GScov . | MISmean . | MIScov . | DTWd . | DTWtnd . |
---|---|---|---|---|---|---|
Pooled groups | ||||||
Pearson r | 0.139 | −0.422* | 0.088 | −0.344* | −0.360* | −0.377* |
Sig. | 0.225 | 0.008 | 0.317 | 0.027 | 0.022 | 0.017 |
N | 32 | 32 | 32 | 32 | 32 | 32 |
AD | ||||||
Pearson r | 0.340 | −0.155 | 0.095 | −0.273 | −0.492 | −0.209 |
Sig. | 0.185 | 0.345 | 0.404 | 0.239 | 0.089 | 0.294 |
N | 9 | 9 | 9 | 9 | 9 | 9 |
HD | ||||||
Pearson r | 0.128 | −0.459* | 0.169 | −0.260 | −0.163 | −0.325 |
Sig. | 0.280 | 0.014 | 0.221 | 0.115 | 0.228 | 0.065 |
N | 23 | 23 | 23 | 23 | 23 | 23 |
Group . | GSmean . | GScov . | MISmean . | MIScov . | DTWd . | DTWtnd . |
---|---|---|---|---|---|---|
Pooled groups | ||||||
Pearson r | 0.139 | −0.422* | 0.088 | −0.344* | −0.360* | −0.377* |
Sig. | 0.225 | 0.008 | 0.317 | 0.027 | 0.022 | 0.017 |
N | 32 | 32 | 32 | 32 | 32 | 32 |
AD | ||||||
Pearson r | 0.340 | −0.155 | 0.095 | −0.273 | −0.492 | −0.209 |
Sig. | 0.185 | 0.345 | 0.404 | 0.239 | 0.089 | 0.294 |
N | 9 | 9 | 9 | 9 | 9 | 9 |
HD | ||||||
Pearson r | 0.128 | −0.459* | 0.169 | −0.260 | −0.163 | −0.325 |
Sig. | 0.280 | 0.014 | 0.221 | 0.115 | 0.228 | 0.065 |
N | 23 | 23 | 23 | 23 | 23 | 23 |
Discussion
This study aimed to evaluate whether variability measures of F2 slope productions have added value to averaged F2 slope measures in discriminating speech produced by speakers with dysarthria from neurotypical speech. The utility of these variability measures in predicting intelligibility of dysarthric speech also was assessed.
Sentence Durations
To evaluate the effect of rate changes on variability outcomes, sentence productions of fast, habitual, and slow productions were analysed. Similar to findings reported in previous research [13, 46], when pooled across groups, speakers were able to implement a relative change in rate. With respect to implementing a slow rate, some variation was noted, with control speakers decreasing rate by 88%, while speakers with HD and AD decreased rate by 32% and 47%, respectively. On the other hand, the increase in articulation rate was largely similar across groups. Control speakers increased rate by 24%, while speakers with HD and AD increased rate by 18% and 21%, respectively. Given these successful rate changes, the current study was able to assess variability measures of F2 slope productions across an array of speech production rates.
Slope Measures
Previous research has shown that averaged global slope measures might not be ideal for distinguishing among neurological conditions. For example, in the study by Rowe et al. [11] there were no differences in global slope measures for speakers with PD and speakers with amyotrophic lateral sclerosis. We also found no difference in averaged global slope for speakers in the HD and AD groups. On the other hand, overall mean global slope values were higher in the control speakers compared to the HD group, largely carried by the habitual and fast rate conditions. This finding is in line with other studies that have found shallower slopes in speakers with HD compared to control speakers [8]. Mean F2 global slope was, predictably, found to increase with increasing speaking rate as this measure is closely associated with the rate of change in vocal tract shape and lingual movement velocities [19, 47].
In contrast to the findings for mean global slope, our variability measure of global slope did differentiate between the HD and AD groups, suggesting the added value of assessment of variability in dysarthria research. Furthermore, this measure was also able to differentiate between speakers with AD and control speakers. The effect of speaking rate was found to be important as the difference between the AD and HD groups was expressed under the slow rate condition only, while the difference between speakers with AD and control speakers was found in all three rate conditions.
For mean maximum instantaneous slope measures, findings were similar compared to mean global slope measures. Mean maximum instantaneous slope values did not differ between the two clinical groups, but were higher in the control speakers compared to the HD group, largely driven by trends under the habitual and fast rate conditions. In previous research, comparisons between averaged and maximum F2 slope measures have yielded mixed results. It has been found that maximum F2 slope measures were more sensitive to detecting mild dysarthria associated with MS compared to average slope measures [20], while in contrast, Tjaden et al. [9] found that global slope measures, not maximum slope measures, distinguished between speakers with HD and control speakers. In contrast to the findings for global slope variability, no differentiation was found between the HD and AD groups for maximum slope variability, although maximum slope variability was found to be higher for the AD group compared to the control speakers. Taking the findings of the various slope measures into account, a picture emerges indicating that the differentiation power of slope outcome measures may be specific to the type of dysarthria. That is, average measures of F2 slope metrics may be better at differentiating speakers with HD from control speakers, while variability measures of F2 slope metrics may be better at differentiating speakers with AD from control speakers. Such findings might be tentatively explained in relation to the speech characteristics of the dysarthria involved. For HD, prominent characteristics involving hypokinesia and rigidity are associated with reduced articulatory output and articulatory undershoot, presented in a relatively stable fashion [48]. This likely affects overall, averaged, slope productions but does not necessarily lead to increased variability in F2 slope productions. For ataxic or ataxic-spastic dysarthria, prominent speech characteristics include irregular articulatory breakdowns, distorted vowels, excess and equal stress, in addition to an overall slow articulation rate [2]. This inconsistent articulatory behaviour, generally characterized as a pattern of instability [49], might explain the differential power of F2 slope variability metrics for this particular dysarthria type.
The current study also adds to the small body of research using DTW to assess speech production in dysarthria (e.g., [50]). DTWd was higher in the speakers with AD compared to control speakers, indicating increased production variability for the speakers with AD. This effect was largely driven by the slow rate condition. These findings are consistent with our previous work [21], in which sentence-length utterance-to-utterance spectro-temporal variation was found to be higher for speakers with AD, but not for speakers with HD, compared to neurotypical speakers. These group differences were also largely driven by the slow condition. One implication is that DTW slope measures and sentence-length spectro-temporal variability measures reflect and capture similar constructs with respect to speech production variability in neurologically affected speech.
Measures reflecting a stability index based on repeated productions such as the spatiotemporal index (STI) are most comparable to the current DTW results. The current study did not find increased DTWd for speakers with HD compared to control speakers. Relatedly, McHenry reported no significant differences in STI between a group with mild dysarthria and neurotypical controls. Previous studies investigating the effects of rate on stability of speech production have reported differentiating results, but the majority of findings indicated stability to be highest under habitual rate conditions compared to non-habitual conditions [23, 28]. The current study indicated the presence of highest stability under the fast condition. This may reflect a more open loop or ballistic type of motor control, leading to increased stability (c.f., [51, 52]).
The time normalized DTWd measure indicated higher variability for the AD group compared to the speakers with HD, but post hoc analysis indicating that no individual rate condition carried this effect. Notably, no differences were found between the control group and either clinical group for this measure. These results tentatively indicate that time normalizing might be beneficial for speakers with HD, i.e., time normalizing of slope trajectories eliminated a relatively large part of variability. This suggests that variability is mostly expressed in the temporal dimension for this group. Following this interpretation, speech production of speakers with AD might be, in contrast, characterized by predominantly higher spatial variability as this type of variability is relatively robust to time normalization. These findings are corroborated by the large spread in global slope coefficient of variation outcomes for the speakers with AD, compared to the speakers with HD, see also Figure 3.
Correlational Analyses
Neither of the averaged measures of global slope nor maximum instantaneous slope was significantly associated with intelligibility. In contrast to the current findings for averaged slope measures, a number of studies did find significant correlations between averaged measures of F2 slope metrics and intelligibility in dysarthria [9, 13, 15, 53], although others found correlations when limited to selected conditions or individual speakers [8, 17]. These discrepancy between current findings and previous research might also be due to differences in tasks (F2 slopes extracted from individually produced words vs. sentences), as well as differences in diphthong identities. Furthermore, apart from deviant vowel production, a substantial contribution to intelligibility reduction in dysarthria is the production of imprecise consonants and consonant-vowel transitions [8, 54, 55], indicating a multi-faceted role of speech segment production in intelligibility decline.
Pooled over both clinical groups, significant correlations between transcription intelligibility and F2 slope metrics were exclusively present for measures assessing variability of speech productions. The trends and significant negative correlation between variability slope metrics and intelligibility present for the individual clinical groups further confirm the association of decreased intelligibility with increased production variability as found in previous literature. Particularly, correlational analyses between variability measures of speech intensity over repeated sentence-length productions and measures of intelligibility in our previous work yielded largely similar patterns of findings [27], although it should be noted correlations were evaluated for a subset of the participants of the current study. Overall, the current findings demonstrate the value of variability measures of F2 slope productions as predictors of intelligibility in dysarthria.
Limitations and Future Directions
The current study has a number of limitations, which we propose might be addressed in future research. One limitation is the relatively small number of participants in the AD group, although our participant numbers are comparable to other studies [56, 57]. In addition, the AD group varied with respect to underlying aetiology. Overall severity of dysarthria also was not equated for the two clinical groups, with the AD group exhibiting more severe dysarthria than the HD group. This may also indicate that F2 variability slope measures are more sensitive to more severe dysarthria since these measures were generally not effective in differentiating control speakers from the less impaired HD group. Furthermore, participants were asked to employ self-pacing when producing the sentence repetitions, potentially resulting in nonuniform and self-chosen trade-offs between speed and accuracy, introducing an extra factor of variation at the level of individual subjects. In addition, the task constraints of the sentence repetition tasks were such that they might not be representative of natural speech. The tasks used in the present study are, in terms of cognitive load, less taxing to execute as they involve reading a single sentence, and less demanding in terms of speech motor control as they involve the planning and executing the motor program of a memorized phrase. Future studies should employ speech tasks other than sentence productions when applying F2 slope variability measures to determine whether our findings generalize beyond this task [58].
Conclusions
This study suggests the added value of assessing variability of F2 slope productions compared to averaged measures in distinguishing between speakers with AD and HD and neurotypical speakers. Averaged values were more successful in distinguishing speakers with HD and control speakers, while variability measures were more successful in distinguishing speakers with AD and control speakers. Modifying speech rates beyond habitually produced sentences proved instrumental in uncovering these differences. In contrast to the measures capturing averaged F2 slope behaviour, measures capturing variability were proficient as predictors of speech intelligibility for speakers of both clinical groups combined, although these results were mostly driven by the group of speakers with HD. Overall, the current findings demonstrate the added value and clinical potential for F2 slope variability measures in the assessment of AD and HD.
Acknowledegments
The authors gratefully thank all speaker and listener participants as well as Nicole Feeley, Grace Galvin, Rebecca Jaffe, Hannah Koellner, Laura Saitta, Sara Silverman, and Noora Somersalo for their help with stimuli preparation and data reduction.
Statement of Ethics
This study protocol was reviewed and approved by the Research Ethics Committee (REC) of NHS West of Scotland (REC 10/S0709/17). Written informed consent was obtained from all participants prior to participating in the study. This research was conducted ethically in accordance with the principles of the Declaration of Helsinki.
Conflict of Interest Statement
The authors have no conflicts of interest to declare.
Funding Sources
This study was funded by a PhD studentship awarded by the Scottish Funding Council to Frits van Brenk (data collection) and the National Institutes of Health (Grant No. NIH-NIDCD R01DC004689) awarded to Kris Tjaden (data analysis and reporting).
Author Contributions
Dr. Frits van Brenk designed and developed the current study, processed, and analysed the data, and drafted the manuscript. Dr. Anja Lowit was involved in the overall design and planning of data collection, interpretation of the results, and editing the manuscript. Dr. Kris Tjaden was involved in the design and planning of data reduction and analysis, interpretation of the results, and editing the manuscript.
Data Availability Statement
All data generated or analysed during this study are included in this article. Further enquiries can be directed to the corresponding author.