Objective: To investigate whether syllables produced in an oral diadochokinetic (DDK) task may be quantified so that persons with Parkinson’s disease (PD) perceived to have reduced articulatory precision when reading may be correctly identified using that quantification. Patients and Methods: Syllable sequences from 38 speakers with PD and 38 gender- and age-matched control speakers (normal controls [NC]) were quantified acoustically and evaluated in terms of (1) the speakers’ ability to accurately predict speaker group membership (PD or NC) and (2) their ability to predict reduced/non-reduced articulatory precision. Results: A balanced accuracy of 80–93% in predicting speaker group membership was achieved. The best measures were related to the proportion of a syllable made up of a vowel, amplitude slope and syllable-to-syllable variation in duration and amplitude. The best material was that based on /ka/. Reduced articulatory precision was accurately predicted from DDK measures in 89% of the samples. Release-transient prominence and voicing during the onset of plosives were particularly strong predictors. Conclusions: DDK sequences can predict articulatory imprecision as observed in another speech task. The linking of performance across speech tasks probably requires measures of stability in syllable durations and amplitudes, as well as measures of subsyllabic acoustic features.

Abnormal articulation is a frequent symptom in neurogenic speech disorders. The diagnosis of dysarthria involves an auditory perceptual assessment of respiratory, phonatory, articulatory and prosodic changes caused by the neurological condition, whether it be a neurological disease or stroke. The types of speech change identified are often indicative of the site of the lesion within the central or peripheral nervous system, meaning that a careful and correct description of those changes can assist the diagnostic process. In persons with Parkinson’s disease (PD), the associated dysarthria is characterised by hypokinesia and bradykinesia; common perceptual features include a weak, monotonous voice, a variable speech rate and imprecise consonants [1].

Acoustic analysis can be a valuable complement to the perceptual evaluation of dysarthric speech. There are several aspects that can be measured, related to both the time domain (e.g., duration of syllables and sound segments) and the frequency domain (e.g., voice quality and stability, as well as vowel and consonant articulation) [2]. Attempts have been made to construct an acoustic typology of motor speech disorders [3]; dysarthria in PD has been shown to exhibit low mean intensity levels, a narrow maximum phonational frequency range, tremor and intensity decay [4].

Rapid syllable repetition is a commonly used task for the auditory perceptual assessment of speech changes and consists of the speaker repeating a syllable sequence, such as /papapapapa/, /tatatatata/ or /kakakakaka/, as fast and evenly as possible. In cases where, as here, only one syllable is repeated, the task is referred to as an alternating motion rate (AMR). The AMR task is a maximum performance task which particularly taxes the speaker’s articulatory system and is sensitive to temporal and energy regularities in speech production [2]. The measure most often obtained from the task is referred to as a diadochokinetic (DDK) rate, which assesses the ability to perform the cycles of articulatory movements necessary to form syllables consisting of a consonant and a vowel. The DDK task has been used to differentiate traumatically induced dysarthria from normal speech and has also been suggested as a biomarker reflective of disease progression in multiple sclerosis [5] and of pre-manifest Huntington’s disease [6, 7]. In PD, the AMR may be either fast, normal or slow, and in some cases there may be freezing with imprecise consonants and dyscoordination of voicing [5, 8].

Earlier attempts to differentiate speakers with a neurogenic condition from healthy controls focused on the rate of syllable production and on changes in that rate across the syllable sequence. Hence, several acoustic domains may have been unduly disregarded. We would like to put forward a few suggestions in this respect. First, time domain measures of DDK sequences should be complemented with measures of short-term variability from other fields of research [9, 10] to capture variability from one syllable to the next. Variability in the amplitude domain should also be considered [11; Fig. 1] beyond simply being assumed to take the form of linear change across a sequence. We therefore suggest that shimmer measures applied to the DDK sequence could provide valuable insights into the nature of production.

Second, the spectral properties of the syllable also need to be considered, as they are related to the quality of production of plosives and vowels within the syllable and are therefore important for the ability to transfer the DDK assessment to more general speech tasks. However, the issue of measurement feasibility needs to be kept in mind when extending the analysis with segmental acoustic measures, as the DDK task typically yields a large number of syllables. For example, the F2 slope has been shown to be an important correlate of hypokinetic dysarthria [12, 13] and would probably add valuable information also in the context of a DDK sequence. However, in the absence of automated quantification procedures, it does not seem feasible to apply that measure to DDK syllables. Similarly, the voice onset time (VOT) of plosives has been used to assess the articulatory proficiency of persons with dysarthria, but there is currently no publicly available automatic procedure for reliable detection of the components required to compute the VOT. Indeed, it has been suggested that when the VOT cannot be measured, even manually, this in itself may be an index of the presence of dysarthria [14].

Third, there are also more global approaches to consider. Godino-Llorente et al. [15] achieved 81–85% accuracy in distinguishing PD speakers (n = 50) from normal controls (NC; n = 50) based on whole-spectrum quantification of the dynamics of the envelope of the DDK sequence. However, those authors did not attempt to link their findings to descriptions of the patients’ actual speech motor deficiencies or to the perception of their speech. Hence, this approach, while promising, at present has many of the same limitations as other work [16] which has focused on prolonged vowels as a way to identify PD speakers, attaining high accuracy (>99% in samples containing upwards of 50 individuals) but offering, at present, limited information that may be of use in a clinical context.

Given the above, we believe that the best way forward is to use acoustic measures that (a) reflect our understanding of dysarthria and (b) readily lend themselves to automatic extraction based only on the acoustic signal and boundaries between units. Several measures meeting the latter criterion have been described in the literature. For example, Dromey and Bjarnason [17] applied a measure of stop closure portion to vowel amplitude ratio as a measure of reduced motor speech proficiency, which is easy to extract reliably once the boundaries of syllable nuclei and onsets have been established. Further, the prominence of the release peak of plosives [18] has been shown to be informative of stop closure deterioration [see illustrations in 18] and may be applied across a large set of consonants. Finally, some reports have suggested that voicing spread and devoicing, reflecting reduced articulatory precision, may be additional indicators of PD [18-20]. We suggest that those segmental measures could be used to provide a detailed picture of the quality of speakers’ articulatory movements.

To summarise, several attempts have been made to identify and quantify acoustic aspects of dysarthria in PD. The goal of this type of research is not to replace neurological diagnostics based on the clinical assessment of dysarthria, but to find non-invasive, valid and reliable ways to quantify articulatory impairment which may assist in the differential diagnosis and/or subgrouping of persons with neurological disease and also be used to monitor disease progression and evaluate different types of treatment. The fact that the articulatory target is predictable also makes it more likely that the procedure can be automated, opening up the possibility of using it in screening. The aim of this study was to assess the extent to which information from an expanded quantification of syllable repetitions may be used to identify persons with dysarthria or perceived reduced articulatory precision in words produced when reading. The research questions asked were: (1) How well can objective quantifications of DDK sequences be used to distinguish speakers with PD from gender- and age-matched controls? and (2) How well can reduced articulatory precision in speakers with PD be predicted using quantifications of DDK sequences performed in the same recording session? These questions were addressed in two substudies (part 1 and part 2).

Participants

In part 1 of this study, 38 speakers with idiopathic PD and 38 age- and gender-matched controls (NC) were investigated. The average score of the speakers with PD on part III of the Unified Parkinson’s Disease Rating Scale (UPDRS) was 38 ± 12 (range: 19–58). The average rating on item 18 (speech) was 1.14 ± 0.77 (range 0–2); 13% of the patients had been rated as “normal” (score 0), 52% of the patients were rated as having “slight loss of expression, diction and/or volume” (score 1) and 35% were rated as “monotone, slurred but understandable; moderately impaired” (score 2). They were all off levodopa medication at the time of recording. The 38 control speakers were selected to match each PD speaker closely in terms of age and gender; the age and gender distributions of the two groups are presented in Table 1.

Table 1.

Number and mean age (with standard deviation) of male and female participants in each group

Number and mean age (with standard deviation) of male and female participants in each group
Number and mean age (with standard deviation) of male and female participants in each group

Part 2 of the study included those 19 (15 males and 4 females; mean age: 61.0 years) of the 38 persons with PD who had been assessed for articulatory quality based on words extracted from readings of a standard text [20].

Speech Material

The speech material consisted of audio recordings of AMR DDK sequences of /pa/, /ta/ and /ka/ syllables. The participants were asked to produce the syllables as fast as possible for as long as they could, and they were instructed to practice the task at a self-selected pace before each fast sequence. Participants frequently made several attempts of producing faster and longer sequences. A DDK sequence was defined as the syllables produced by a speaker in one breath. An overview of the analysed material is presented in Table 2. All produced syllable sequences were included in case of repeated attempts. An overview of the average number of syllable sequences produced by speakers (with standard deviations) is presented in Table 2, along with the corresponding total number of included syllables. All syllables were annotated in terms of the start and end points of the syllable, as well as the temporal location of the boundary between the syllable onset and the nucleus (the vowel segment). However, the initial syllable of each sequence was not annotated, owing to the difficulty of establishing the starting point of the syllable. Where no clear syllable onset was discernible, no boundary was inserted between the onset and the nucleus. Similarly, in productions with no discernible nucleus, the syllable was annotated as consisting of a single consonant segment. The annotations were performed either manually or using automatic segmentation software, in which case a manual check for correct boundary placements was subsequently carried out.

Table 2.

An overview of the analysed syllable sequences

An overview of the analysed syllable sequences
An overview of the analysed syllable sequences

For part 2, the averaged DDK sequence measures were combined with assessments of articulatory quality which had previously been performed on words produced while reading a standard text [20] collected at the same time as the DDK sequences. The assessed words had been selected because of their structural complexity: they included multisyllabic words with complex syllable onsets and words which are phonetically diverse in that they include both voiced and voiceless fricatives and plosives produced at different places of articulation.

Acoustic Analysis

In the preparation of the DDK data, each transcribed syllable was processed in terms of its time domain and amplitude domain characteristics. The duration of each syllable was extracted along with the proportion of the syllable which made up the syllable nucleus (%N). The syllable onset and nucleus were divided into 100 slices each, and the presence or absence of phonation was determined for each slice. Then, a percentage-of-phonation property was computed separately for the syllable onset and the nucleus (%Phon and %NPhon). Corresponding variables were also computed with regard to the initial, the medial and the final third of each onset (%Phon_init, %Phon_med and %Phon_fin, respectively) and nucleus (%NPhon_init, %NPhon_med and %NPhon_fin, respectively) [19]. Further, the root mean square amplitude of syllable onsets and nuclei was also extracted.

The acoustic prominence of the release peak was computed as the deviation of the amplitude of the release transient from a linear regression fitted to the amplitude within the syllable onset (RTP) [18]. An alternative quantification, in which the deviance of the release transient amplitude is relative to the amplitude median (mRTP), was also computed. Finally, the relationship in terms of amplitude between the syllable onset and the nucleus [17] was computed for each syllable (O/N Ampl.).

All of those acoustic properties of the syllables produced were then fed into an analysis of the properties of full DDK sequences with respect to duration, amplitude, voicing and closure proficiency. The duration of syllables was analysed in terms of the average rate of production as well as five forms of short-term variability within a sequence (e.g., Jitter and normalised pairwise variability index [nPVI]). Short-term variability was analysed with regard to both normalised rates and raw forms. Progressive change in rates was quantified using measures based on the relative pace of parts of DDK sequences as highlighted by a research group in a number of studies (relStab5–12, relStab13–20 and %PA) [7]. The overall stability of pace across an entire DDK sequence was also quantified using the slope of a linear regression fitted to the syllable sequence durations (Rate slope). Further, a measure of overall variability in syllable durations (sdSylldur) was included to account for variability that was not captured by other durational metrics.

In addition to the durational quantification, each DDK sequence received an amplitude quantification based on a corresponding mathematical formula. Hence, the Shimmer and amplitude PVI (nPVI_A) of DDK sequences were computed based on the amplitudes of the syllable nuclei. Measures of progressive change in amplitudes (Ampl.relStab5–12, Ampl.relStab13–20, %AD and Ampl. slope) were also computed in the same manner as the corresponding durational measures. A measure of overall variability in syllable amplitudes (sdSyllAmp) was included to account for variability that was not captured by other metrics.

Measures of syllable production quality, i.e., the percentage-of-phonation measures, the prominence of the release peak and the onset-to-nucleus amplitude relationship, were quantified in terms of their average value and overall variability within each sequence; short-term variability measures were not applied to those aspects of a sequence. The progressive change in the percentage of phonation was quantified based on the slope of a linear regression (Prog. %Phon and Prog. %NPhon). A description of all measures computed is presented in Table 3.

Table 3.

An overview of the computed acoustic measures

An overview of the computed acoustic measures
An overview of the computed acoustic measures

Study Design

The study consisted of two parts. In part 1, the speaker group (PD or NC) of each participant was predicted based on the outcomes of various subsets of extracted DDK measures. Part 2 involved the evaluation of the accuracy of a model designed to predict, based on DDK sequences, whether words produced in the same recording session had been rated as reduced in articulatory quality. Previously published perceptual evaluations of 19 of the PD speakers [20] were combined with the DDK sequences performed by the same speakers.

Perceptual Assessment

The perceptual evaluations used in part 2 were ratings of single words produced by participants while reading a text. The material had been assessed repeatedly with good inter- and intrarater reliability in a previous report, using a blinded and randomised procedure [20]. The ratings were performed by two speech-language pathology students as part of their master’s thesis. They included only samples from a subset of the PD speakers included in part 1 (n = 19). The raters assessed articulatory quality in extracted speech samples on a 1–4 scale (1 = normal articulatory precision, 2 = slightly reduced articulatory precision, 3 = highly reduced articulatory precision, 4 = severely reduced articulatory precision). They were presented with the words in random order and assessed each word 5 times. Their average rating across the 10 assessments of a word was taken as the rated articulatory precision for that word. As the distribution of their ratings did not support modelling of the entire severity scale, the statistical effort focused instead on the identification of words produced with reduced articulation. To this end, the rating scale was compressed into a dichotomous scale where an average rated articulatory precision above 2 was deemed to represent reduced articulatory precision and an average below 2 was deemed to represent non-reduced articulatory precision. Eighty-four percent of the included participants were on average rated as having non-reduced articulatory precision, and 16% were rated as having reduced articulatory precision.

Statistical Analysis

The statistical methods used in the two parts of the present study involved logistic regression modelling with a binomial response. The predictors were reduced to an efficient subset by L1 regularised logistic regression. This type of regression enables the selection of predictors to be included in a model by applying a shrinking parameter λ which allows the coefficients of less important variables to be shrunk to zero. The prediction accuracy of the models was evaluated based on the area under the receiver operating characteristic (ROC) curve for the model predictions, meaning that it was not based on a specific cut-off. The ROC curve visualises the trade-off between successful identification of PD speakers and the risk of erroneously classifying an NC speaker as having PD for varying cut-offs applied to the logistic regression to make the predictions. See Fawcett [21] or Watts and Awan [22] for an overview of ROC-based analysis. The model with the highest average and the most stable area under the ROC curve across multiple evaluations was considered to be the most efficient model.

In part 1 of the study, the binomial response consisted of the predicted group membership (PD or NC) of a speaker based on an efficient subset of DDK measures. A 5-fold cross-validation procedure [23] was employed for model fitting and evaluation of performance. In this procedure, the data set was divided randomly into 5 equal parts. Model fitting was then performed in 5 iterations of cross-validation where parameters were computed in 4 parts of the data, and evaluated in the data that were not used for fitting the model. In the next iteration, another part of the model was left out of model fitting and used only for evaluation. All reported performance metrics reported describe the predictive ability of models when they were applied with data not used for fitting the model.

In part 2, the modelled response was the average perceptual rating transformed to a reduced/not reduced scale, and the predictor the average of a measure obtained from a syllable type across trials. Cross-validation of the L1 penalised regression was performed in a leave-one-out procedure in order to ensure that the balance between reduced and non-reduced articulatory quality remained stable across evaluations.

An overview of the model selection procedure used is presented in Figure 1. For each syllable sequence type (/pa/, /ta/ and /ka/), the average predictive accuracy (area under the ROC curve) is displayed for each value of log(λ). The number of predictors selected at a given λ value for the parameter shrinkage factor is indicated at the top of each subfigure. For /pa/, the inclusion of predictors beyond the 8 most productive measures did not improve performance as measured by the area under the ROC curve. For /ta/, the model did not improve beyond 6 included predictors; and for /ka/, only 4 predictors were required to reach the highest observed level of performance.

Fig. 1.

Stability in prediction accuracy for /pa/ (a), /ta/ (b) and /ka/ (c) syllable sequences depending on the number of predictors selected for inclusion. The dot and whiskers indicate the average accuracy of predictions, with standard deviations, when the numbers of predictors indicated at the top of the plot were included as a result of applying a regularisation parameter, lambda (λ). AUC, area under the curve, i.e., the receiver operating characteristic curve.

Fig. 1.

Stability in prediction accuracy for /pa/ (a), /ta/ (b) and /ka/ (c) syllable sequences depending on the number of predictors selected for inclusion. The dot and whiskers indicate the average accuracy of predictions, with standard deviations, when the numbers of predictors indicated at the top of the plot were included as a result of applying a regularisation parameter, lambda (λ). AUC, area under the curve, i.e., the receiver operating characteristic curve.

Close modal

Figure 2 shows the ROC curves for the best models identified for each syllable type. Performance in speaker group identification based on /pa/ syllables was had an AUC (area under the curve) of 0.80, with a small imbalance in the outcome across groups (sensitivity = 0.75 and specificity = 0.79 when a cut-off of 0.5 was applied to the logistic regression). Predictions based on /ka/ attained up to 0.75 sensitivity before also resulting in a reduction in specificity, while /pa/ and /ta/ only supported a sensitivity up to 0.36 and 0.58, respectively, before specificity was reduced. The predictors identified as productive were %N, RTP, Rate, O/N Ampl., APQ3, nPVI, Ampl. slope and Rate slope.

Fig. 2.

Receiver operating characteristic curves for the classification of normal control and Parkinson’s disease speakers based on /pa/, /ta/ and /ka/ syllables. Area-under-the-curve (AUC) measures are provided for each classification to indicate the performance of the model in separating speakers.

Fig. 2.

Receiver operating characteristic curves for the classification of normal control and Parkinson’s disease speakers based on /pa/, /ta/ and /ka/ syllables. Area-under-the-curve (AUC) measures are provided for each classification to indicate the performance of the model in separating speakers.

Close modal

For /ta/, both accuracy (AUC = 0.83) and balance in predictions (sensitivity = 0.75 and specificity = 0.76) were improved compared to /pa/. The predictors selected in the best models were %N, Rate, Shimmer, Ampl. slope, Prog. %Phon and nPVI.

The best predictive accuracy was observed for /ka/ syllables (AUC = 0.93). This task also showed a maintained balance in predictions (sensitivity = 0.85 and specificity = 0.84 at a cut-off of 0.5). The predictors selected in the best models were %N, Jitter, Ampl. slope and nPVI.

To find the answer to the second research question, a cross-validated investigation was carried out into the extent to which a perceived reduced articulatory quality of words produced when reading could be predicted from measures of DDK performance. The minimum classification error was achieved using 8 predictors, which together achieved an overall accuracy of 0.89 (95% confidence interval: 0.72–0.98) using a cut-off of 0.5 (sensitivity = 0.92, specificity = 0.85). The predictors identified as the most productive for correctly identifying samples with reduced articulatory quality were meanRTP, sdRTP, Rate, %Phon, APQ5 and nPVI.

We here report on an investigation into how well speakers with PD, and speech impairment associated with the disease, may be identified using acoustic quantifications of syllables that they produce in a DDK task. The quantifications were selected so that both the participants’ ability to sequence syllables in rapid succession and to produce segments with appropriate acoustic properties were assessed. The investigation was divided into two parts, designed to answer each of the two research questions. Part 1 investigated the ability to accurately distinguish PD speakers from age- and gender-matched controls based on the acoustic profile of syllables produced in a DDK sequence. Part 2 investigated the acoustic properties of DDK syllables in terms of their ability to predict that a speaker’s articulatory quality while reading would be rated as reduced. Together, these two substudies aimed to provide an indication of whether it is possible to bridge the gap between performance on speech tasks and performance on DDK tasks.

The results of part 1 indicated that PD speakers may be distinguished, with reasonable accuracy, from age- and gender-matched control speakers based on the acoustic manifestations of syllables produced in DDK sequences. The PD speakers had been rated as having no to moderately impaired speech in the UPDRS evaluation. The syllable /ka/ was found to be the most informative of a speaker’s group membership: it yielded the highest level of prediction accuracy (93%), showed an almost complete balance in predictions and also used the smallest number of predictors (n = 4). The accuracy of this model is comparable to that of the model developed by Godino-Llorente et al. [15], but our model is more focused, easier to interpret in terms of speech movements and probably less computationally intensive. Further, the predictors identified as the most productive for /ka/ sequences (%N, Jitter, Ampl. slope and nPVI) were also found among the predictors identified for /pa/ and /ta/. In other words, those measures (reflecting the percentage of a syllable’s duration made up of the nucleus, variation in duration from syllable to syllable, and the declination of the amplitude across a sequence) were the most indicative of speaker group affiliation.

Predictions based on /pa/ and /ta/ syllables yielded somewhat less accurate predictions than those based on /ka/ (80 and 83%, respectively) and also used a wider range of measures to do so. In this context, it should be noted that there was a pattern to the addition of predictors. Besides the measures identified as sufficient for /ka/ sequences, accurate prediction based on both /pa/ and /ta/ sequences required the addition of measures of articulatory rate (Rate), short-term variability in the amplitude of vowels (Shimmer and APQ3) and a declining vowel amplitude across a sequence (Ampl. slope). Then, each of those two syllables also had its own signature predictors: /pa/ showed sensitivity to variability in the amplitude of the release transient (sdRTP) and to the relationship in terms of amplitude between the syllable onset and the syllable nucleus (O/N Ampl.), while for /ta/, the progression of the voicing of onsets (Prog. %Phon) was shown to be informative of speaker group affiliation. The reason why those two syllables are associated with these specific signature predictors is open to speculation, and we will refrain from proposing an interpretation at this point.

The aim of part 2 of the study was to investigate how well a speaker’s reduced articulatory precision in words produced when reading can be predicted from the same speaker’s performance on the DDK task. The best model achieved 93% overall accuracy in predicting reduced articulatory precision. This model included the average transient prominence (RTP) as well as consistency in realisation (sdRTP), the relationship in terms of amplitude between the syllable onset and the nucleus (O/N Ampl.), the extent of voicing of the syllable onset (%Phon) and short-term variability in both amplitude and duration (APQ5, nPVI). Hence, a tentative answer to the second research question could be that the presence of reduced articulatory proficiency when reading may be predicted for a speaker with up to 93% accuracy based on a DDK sequence produced in the same session.

While the results of part 2 do suggest that DDK sequences may be quantified in a way that captures aspects of reduced articulatory proficiency and that those quantifications will transfer to ratings of parts of read speech, we consider those findings indicative and preliminary, for two main reasons. First, the samples assessed by Eklund et al. [20] were single words and may therefore have increased the focus on segmental effects of dysarthria more than what would have been the case if full sentences had been assessed. Therefore, the sample choice may have artificially increased the similarity of the two tasks.

Second, the original scale used by Eklund et al. [20] was compressed here to a dichotomous scale of reduced/not reduced in order to make it possible to evaluate an initial model for prediction based on DDK sequences and to support cross-validation. This post hoc conversion of rating scales should be kept in mind when interpreting the results, as it means that information about the severity of reduction is lost and hence not explicitly modelled. It might very well be the case that the model is best at identifying aspects that are present in speech with reduced articulatory precision, whatever the level of the reduction, but not specifically in speech with severely reduced articulation, as this was less frequent in the data set. For this reason, there is a need to extend the pilot evaluation presented here to a larger study where more speakers are rated in terms of their articulatory proficiency in read speech and where their ratings are linked to their DDK performance. Further, the participants’ level of dysarthria had not been assessed; a more complete picture of their overall speech deterioration due to PD would have provided a great deal of valuable information that could have been used to further develop the method.

What we have observed is the co-occurrence of certain acoustic features in DDK sequences with the perception of reduced articulatory quality. Obviously, this does not mean that the specific acoustic properties identified were also the ones that the listeners had based their ratings on. Instead, it is likely the case that the underlying articulatory impairment caused by PD may have different acoustic and/or perceptual consequences depending on the demands of a specific speech task [12]. Hence, we cannot assume that, say, variability in the prominence of the release transient is a substantial factor underlying perceived articulatory quality. Rather, we have observed that individuals with dysarthria who are variable in their release transient manifestations tend to be perceived as having less clear speech, and we may assume that this is due to their reduced ability to perform precise and accurately timed movements to produce oral constriction and plosive release.

Our quantification of the DDK task adds segmental-level acoustic properties that were found to contribute substantially to the successful identification of PD speakers. Previous research has shown that the pace of syllable production in DDK tasks is not indicative of the speaking rate in continuous speech [24]. Our results tentatively suggest that the disassociation between the DDK task and general speech tasks may in part be alleviated by considering also the quality of the segments produced. However, more research is needed to shed further light on the connection between performance on a task such as the DDK task and such persons’ ability to speak in a more natural situation at a level of articulatory quality which others perceive as sufficient.

It should be kept in mind that the DDK task includes only a very limited set of segments, and that a wide range of aspects of speech, such as intonation, are not possible to quantify based on the DDK task. The assessment that we present should therefore not be considered a substitute for a full assessment of articulatory impairment made by a speech-language pathologist. Further, the mark-up procedure is currently time-consuming. However, we propose that the procedure is within reach of automation due to the relatively clear distinctions that need to be made based on the acoustic signal. Unlike spontaneous speech, the articulatory target is known, which makes acoustic processing easier. We envision that the analysis could be used when screening for articulatory impairment at remote locations by non-expert clinical staff for later referral to a speech-language pathologist for a more complete evaluation. However, considerable developments in automation of the processing procedure are required before the procedure can be fully implemented in clinical practice.

We have shown that DDK sequences may be acoustically quantified in a way that makes it possible both to distinguish speakers with PD from age- and gender-matched control speakers and to predict whether the person having produced such a sequence will be perceived as having a reduced articulatory quality in words produced when reading. However, more DDK samples and a more diverse set of samples in terms of levels of articulatory imprecision are required in order for an attempt to be made to develop the methodology studied into a reliable model that may be applied more generally in research and in clinical settings.

We would like to extend our gratitude to Marcus Karlsson and Urban Viklund for their assistance in providing draft mark-ups of DDK sequences for manual correction, which facilitated the completion of the transcription procedure, and to Fanny Viklund and Johanna Qvist for making the perceptual assessments. The support of the Swedish Research Council (grants 2011-2294 and 421-2010-2131) for the projects in the framework of which the recordings were made is also gratefully acknowledged.

All participants had given their written consent to participate in the study, and the data collection procedure was approved by the Regional Ethical Review Board of Umeå, Sweden (Ref. No.: 08-093M).

The authors have no conflicts of interests to report.

1.
Duffy
JR
.
Motor Speech Disorders
. 3rd ed.
St. Louis (Missouri)
:
Elsevier Mosby
;
2013
.
2.
Kent
RD
,
Weismer
G
,
Kent
JF
,
Vorperian
HK
,
Duffy
JR
.
Acoustic studies of dysarthric speech: methods, progress, and potential
.
J Commun Disord
.
1999
May-Jun
;
32
(
3
):
141
80
.
[PubMed]
0021-9924
3.
Kent
RD
,
Kim
YJ
.
Toward an acoustic typology of motor speech disorders
.
Clin Linguist Phon
.
2003
Sep
;
17
(
6
):
427
45
.
[PubMed]
0269-9206
4.
Holmes
RJ
,
Oates
JM
,
Phyland
DJ
,
Hughes
AJ
.
Voice characteristics in the progression of Parkinson’s disease
.
Int J Lang Commun Disord
.
2000
Jul-Sep
;
35
(
3
):
407
18
.
[PubMed]
1368-2822
5.
Rusz
J
,
Benova
B
,
Ruzickova
H
,
Novotný
M
,
Tykalová
T
,
Hlavnička
J
, et al
Characteristics of motor speech phenotypes in multiple sclerosis
.
Mult Scler Relat Disord
.
2018
Jan
;
19
:
62
9
.
[PubMed]
2211-0348
6.
Vogel
AP
,
Shirbin
C
,
Churchyard
AJ
,
Stout
JC
.
Speech acoustic markers of early stage and prodromal Huntington’s disease: a marker of disease onset?
Neuropsychologia
.
2012
Dec
;
50
(
14
):
3273
8
.
[PubMed]
0028-3932
7.
Skodda
S
,
Grönheit
W
,
Lukas
C
,
Bellenberg
B
,
von Hein
SM
,
Hoffmann
R
, et al
Two different phenomena in basic motor speech performance in premanifest Huntington disease
.
Neurology
.
2016
Mar
;
86
(
14
):
1329
35
.
[PubMed]
0028-3878
8.
Skodda
S
,
Flasskamp
A
,
Schlegel
U
.
Instability of syllable repetition as a marker of disease progression in Parkinson’s disease: a longitudinal study
.
Mov Disord
.
2011
Jan
;
26
(
1
):
59
64
.
[PubMed]
0885-3185
9.
Nolan
F
,
Asu
EL
.
The Pairwise Variability Index and coexisting rhythms in language
.
Phonetica
.
2009
;
66
(
1-2
):
64
77
.
[PubMed]
0031-8388
10.
Amir
O
,
Wolf
M
,
Amir
N
.
A clinical comparison between two acoustic analysis softwares: MDVP and Praat
.
Biomed Signal Process Control
.
2009
;
4
(
3
):
202
5
. 1746-8094
11.
Skodda
S
.
Aspects of speech rate and regularity in Parkinson’s disease
.
J Neurol Sci
.
2011
Nov
;
310
(
1-2
):
231
6
.
[PubMed]
0022-510X
12.
Kent
RD
,
Kent
JF
.
Task-based profiles of the dysarthrias
.
Folia Phoniatr Logop
.
2000
Jan-Jun
;
52
(
1-3
):
48
53
.
[PubMed]
1021-7762
13.
Kim
Y
,
Weismer
G
,
Kent
RD
,
Duffy
JR
.
Statistical models of F2 slope in relation to severity of dysarthria
.
Folia Phoniatr Logop
.
2009
;
61
(
6
):
329
35
.
[PubMed]
1021-7762
14.
Auzou
P
,
Özsancak
C
,
Morris
RJ
;
Pascal Auzou, Canan Ozsancak, Richa
.
van J, Eustache F, Hannequin D: Voice onset time in aphasia, apraxia of speech and dysarthria: a review
.
Clin Linguist Phon
.
2000
;
14
(
2
):
131
50
. 0269-9206
15.
Godino-Llorente
JI
,
Shattuck-Hufnagel
S
,
Choi
JY
,
Moro-Velázquez
L
,
Gómez-García
JA
.
Towards the identification of Idiopathic Parkinson’s Disease from the speech. New articulatory kinetic biomarkers
.
PLoS One
.
2017
Dec
;
12
(
12
):
e0189583
.
[PubMed]
1932-6203
16.
Gómez-Vilda
P
,
Mekyska
J
,
Ferrández
JM
,
Palacios-Alonso
D
,
Gómez-Rodellar
A
,
Rodellar-Biarge
V
, et al
Parkinson Disease Detection from Speech Articulation Neuromechanics
.
Front Neuroinform
.
2017
Aug
;
11
:
56
.
[PubMed]
1662-5196
17.
Dromey
C
,
Bjarnason
S
.
A preliminary report on disordered speech with deep brain stimulation in individuals with Parkinson’s disease
.
Parkinsons Dis
.
2011
;
2011
:
796205
.
[PubMed]
2042-0080
18.
Karlsson
F
,
Olofsson
K
,
Blomstedt
P
,
Linder
J
,
Nordh
E
,
van Doorn
J
.
Articulatory closure proficiency in patients with Parkinson’s disease following deep brain stimulation of the subthalamic nucleus and caudal zona incerta
.
J Speech Lang Hear Res
.
2014
Aug
;
57
(
4
):
1178
90
.
[PubMed]
1092-4388
19.
Karlsson
F
,
Blomstedt
P
,
Olofsson
K
,
Linder
J
,
Nordh
E
,
van Doorn
J
.
Control of phonatory onset and offset in Parkinson patients following deep brain stimulation of the subthalamic nucleus and caudal zona incerta
.
Parkinsonism Relat Disord
.
2012
Aug
;
18
(
7
):
824
7
.
[PubMed]
1353-8020
20.
Eklund
E
,
Qvist
J
,
Sandström
L
,
Viklund
F
,
Van Doorn
J
,
Karlsson
F
.
Perceived articulatory precision in patients with Parkinson’s disease after deep brain stimulation of subthalamic nucleus and caudal zona incerta
.
Clin Linguist Phon
.
2015
Feb
;
29
(
2
):
150
66
.
[PubMed]
0269-9206
21.
Fawcett
T
.
An introduction to ROC analysis
.
Pattern Recgnition Letters
;
2006
.
22.
Watts
CR
,
Awan
SN
.
Use of spectral/cepstral analyses for differentiating normal from hypofunctional voices in sustained vowel and continuous speech contexts
.
J Speech Lang Hear Res
.
2011
Dec
;
54
(
6
):
1525
37
.
[PubMed]
1092-4388
23.
Kuhn
M
,
Johnson
K
.
Applied predictive modeling
.
New York (NY)
:
Springer New York
;
2013
. p.
69
.
24.
Staiger
A
,
Schölderle
T
,
Brendel
B
,
Ziegler
W
.
Dissociating oral motor capabilities: evidence from patients with movement disorders
.
Neuropsychologia
.
2017
Jan
;
95
:
40
53
.
[PubMed]
0028-3932
Copyright / Drug Dosage / Disclaimer
Copyright: All rights reserved. No part of this publication may be translated into other languages, reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying, recording, microcopying, or by any information storage and retrieval system, without permission in writing from the publisher.
Drug Dosage: The authors and the publisher have exerted every effort to ensure that drug selection and dosage set forth in this text are in accord with current recommendations and practice at the time of publication. However, in view of ongoing research, changes in government regulations, and the constant flow of information relating to drug therapy and drug reactions, the reader is urged to check the package insert for each drug for any changes in indications and dosage and for added warnings and precautions. This is particularly important when the recommended agent is a new and/or infrequently employed drug.
Disclaimer: The statements, opinions and data contained in this publication are solely those of the individual authors and contributors and not of the publishers and the editor(s). The appearance of advertisements or/and product references in the publication is not a warranty, endorsement, or approval of the products or services advertised or of their effectiveness, quality or safety. The publisher and the editor(s) disclaim responsibility for any injury to persons or property resulting from any ideas, methods, instructions or products referred to in the content or advertisements.