Objective: The acoustic basis of intelligibility associated with varied clear speech instructions was studied. Methods: Twelve healthy speakers read 18 sentences in ‘habitual', ‘clear', ‘hearing impaired' and ‘overenunciate' conditions. The latter 3 conditions are varieties of clear speech. Acoustic measures included tense and lax vowel space area, a measure of vowel spectral change, articulation rate and sentence-level vocal intensity. Sentences were mixed with multitalker babble to prevent ceiling effects and were orthographically transcribed by 40 listeners. Percent-correct scores were obtained for each speaker and condition. Regression analyses were used to quantify relationships between acoustic measures and intelligibility. Results: Univariate regressions indicated that greater magnitudes of acoustic change in nonhabitual conditions were associated with greater increases in intelligibility. Multivariate regression analysis further indicated that lax vowel space, articulation rate and vocal intensity were significant predictors of intelligibility. Conclusions: Acoustic variables associated with intelligibility differed depending on whether relationships were examined using univariate or multivariate statistics. Multivariate statistics indicated that articulation rate was the strongest predictor of improvements in intelligibility above and beyond all other variables studied. The findings have implications for optimizing therapeutic use of clear speech for clinical populations.

A variety of studies investigating the perceptual consequences of clear speech have reported that a clear speech style is associated with improved intelligibility relative to a habitual or conversational speech style [1,2]. In fact, the term ‘clear speech benefit' is widely used to refer to the increased intelligibility of clear speech. The speech production adjustments accompanying clear speech, as inferred from the acoustic signal, have also been the subject of numerous studies. Relative to conversational speech, clear speech tends to be characterized by enhanced segmental contrasts, lengthened speech durations, an increase in mean fundamental frequency (F₀) and sound pressure level (SPL), and enhanced F₀ and intensity envelope modulation, although there is some variation across studies [1,2]. The mere presence of these kinds of production adjustments in clear speech cannot be taken as evidence of their relevance for intelligibility, however [3,4,5]. Thus, determining the speech production modifications responsible for the clear speech benefit remains an important research goal, as this line of inquiry is relevant for optimizing the therapeutic use of a clear speech style for clinical populations such as dysarthria patients as well as for improving technologies for use with populations who have difficulty understanding speech [1,6].

We previously reported that different instructions or cues for eliciting clear speech yielded similar types of acoustic adjustments, but that the magnitude of the adjustment varied depending on the nature of the instruction [7]. Relative to typical or conversational speech (‘habitual' condition), speakers produced the greatest magnitude of change in vowel space area, vowel spectral rate of change, vowel segment duration and articulation rate when instructed to ‘overenunciate each word' (‘overenunciate' condition), followed by the instruction ‘… as if speaking to someone with a hearing impairment' (‘hearing impaired' condition) and ‘speak clearly' (‘clear' condition). More specifically, the overenunciate condition was associated with the largest vowel space areas, greatest rate of vowel spectral change, longest vowel segment durations and slowest articulation rate. The hearing impaired condition, however, was associated with the greatest magnitude of change in mean vocal intensity (i.e. increased SPL). A follow-up perceptual study further demonstrated that the overenunciate condition was associated with the greatest increase in intelligibility, relative to typical or conversational speech, followed by the hearing impaired and clear conditions [8]. Using the acoustic and perceptual data reported in our previous studies [7,8], the purpose of the present study was to explore whether the acoustic adjustments associated with the different clear speech instructions were also associated with variations in sentence intelligibility. More specifically, we examined whether greater magnitudes of acoustic change were associated with greater increases in intelligibility. Relationships between acoustic measures and intelligibility were examined for data pooled across the 3 conditions (clear, overenunciate and hearing impaired). The following section reviews procedures related to collection of speech samples and the associated acoustic analyses as well as procedures for obtaining perceptual judgments of intelligibility. Readers are referred to our previous publications for a detailed treatment of the material [7,8].

Speakers, Speaking Task and Acoustic Measures

Data from 12 neurologically healthy talkers reported in our previous study [7] were reanalyzed in the current study. The speakers included 6 males and 6 females with a mean age of 24 years (SD: 6 years). The speakers reported no history of speech, language or hearing difficulties and passed a bilateral pure tone audiometric screening at 20 dB at 500, 1,000, 2,000, 4,000 and 8,000 Hz [9]. The speakers were recruited from the student population at the University at Buffalo, were from western New York, and were judged by an investigator to speak Standard American English with an Inland North dialect typical of western New York State. The speakers were paid a modest participation fee. They were recorded in a sound-attenuated booth at a mouth-to-microphone (CountryMan E6IOP5L2 Isomax Condenser) distance of 6 cm. The acoustic signal was preamplified using a Professional Tube MIC Preamp, low-pass filtered at 9.8 kHz and digitized to a computer hard disk at a sampling rate of 22 kHz using TF32 [10]. Prior to recording of each participant, a calibration tone of a known intensity was recorded to calculate vocal intensity from the acoustic signal.

For each of the 12 speakers, 18 different sentences, ranging in length from 5 to 11 words, were selected from the Assessment of Intelligibility of Dysarthric Speech [11]. The sentence sets were selected to contain 3-5 occurrences of the 8 monophthongs /I, ε, *, Λ, α, i, æ, u/ for use in deriving tense and lax vowel space areas as well as a measure of dynamic spectral change. All vowels occurred in stressed syllables of content words. The sentences were read in habitual, clear, hearing impaired and overenunciate conditions. Instructions for eliciting each condition were modeled after those used in other published clear-speech studies, and the extent to which speakers subjectively interpreted instructions to be semantically distinct from one another is a topic for other studies. For the habitual condition, speakers were instructed to read each sentence as it appeared on a computer monitor. The habitual condition is similar to ‘conversational' speech in other clear-speech studies. For the clear condition, speakers were simply instructed to ‘speak clearly' [3,12,13]. In the hearing impaired and overenunciate conditions, speakers were instructed to ‘say the following sentences while speaking to someone with a hearing impairment' [14,15] and to ‘overenunciate each word' [16,17,18], respectively. When selecting instructions for study from those used in other publications, care was taken to choose distinctive instructions. Thus, only ‘talk to someone with a hearing impairment', but not ‘talk to a nonnative speaker', was elicited.

Acoustic measures of interest from our previous study [7] included segmental and suprasegmental measures in the form of static and dynamic vowel spectral measures, sentence-level measures of articulation rate and mean vocal intensity. These measures were of interest to the current study because, unlike other measures such as tense-lax spectral distinctiveness, each of the measures of interest to the present study differed significantly among conditions in the previous acoustic study [7]. Other studies also suggest that vowel spectral characteristics, speech durations and vocal intensity characteristics may be associated with intelligibility [1,19,20].

All acoustic measures were obtained using TF32 [10]. Readers are referred to the previous acoustic study for a more detailed consideration of each measure [7]. Briefly, vowel space area is a static measure reflecting overall vowel spectral contrast, with larger vowel space areas representing greater spectral distinction among vowels [20]. Using midpoint F1 and F2 values, vowel space area was calculated for each speaker and condition for the 4 tense vowels /α, i, æ, u/ and the 4 lax vowels /I, ε, *, Λ/ using Heron's formula. The measure lambda (λ) was used to quantify dynamic spectral characteristics of vowels [3]; λ was calculated for each vowel using F1 and F2 values obtained at 20 and 80% of vowel duration. For each speaker and condition, the λ values were averaged across vowel tokens to yield a mean λ measure for tense and lax vowel categories. Articulation rate, in syllables per second, was calculated for each sentence by dividing total speech time, excluding interword pauses greater than 200 ms, by the number of syllables produced. For each speaker and condition, the articulation rates were averaged across the 18 sentences. Mean SPL also was obtained for each sentence and averaged across sentences for each speaker and condition.

For each speaker, the magnitude of change in each of the nonhabitual conditions (i.e. clear, hearing impaired, overenunciate) relative to the habitual condition was calculated for each acoustic measure. This was accomplished by subtracting acoustic measures in each nonhabitual condition from the habitual, including tense vowel space area, lax vowel space area, mean tense λ, mean lax λ, mean articulation rate and mean SPL. As described below, this procedure also was used to quantify the magnitude of the clear speech intelligibility benefit for each of the nonhabitual conditions versus the habitual condition.

Listeners and Perceptual Task

The judgments of intelligibility provided by the 40 listeners reported in our previously published perceptual study [8] were reanalyzed in the current study. The listeners (15 males and 25 females) ranged in age from 19 to 42 years (mean: 22 years; SD: 5 years). They met the same inclusionary criteria as the speakers and were paid a modest participation fee.

To avoid familiarization with the sentence stimuli, each listener orthographically transcribed sentences produced in 1 condition by each of the 12 speakers. Speakers and conditions were blocked and randomized across listeners. To prevent ceiling effects, the sentences were equated for overall root mean square amplitude and mixed with multitalker babble at a signal-to-noise ratio of -3 dB, as is routinely done in clear-speech studies. Pilot testing was undertaken to confirm the appropriate signal-to-noise ratio for minimizing ceiling and floor effects. The stimuli were presented to listeners via Sony Dynamic Stereo Headphones (MDR-V300) at 70 dB in a sound-treated room. The listeners heard sentences once and typed their responses using the computerized Sentence Intelligibility Test software [21]. Percent-correct scores were computed for each speaker and condition by pooling responses across listeners, dividing all correctly transcribed words by all possible words and multiplying by 100. For each speaker, the intelligibility score for each nonhabitual condition (clear, hearing impaired, overenunciate) was subtracted from that for the habitual condition, yielding 3 intelligibility difference scores.

Data Analyses

Data were pooled across the clear, hearing impaired and overenunciate conditions for all analyses. As in many other studies investigating the acoustic basis of intelligibility, univariate linear regression analysis was used as a first step to explore the relationship between the magnitude of acoustic change and the magnitude of intelligibility change for the 3 clear-speech variants relative to habitual speech. Given the likelihood that acoustic variables would be interrelated, Pearson product-moment correlations also were computed for pairs of acoustic variables. Finally, a multivariate regression analysis was used to quantify the predictive relationship of all acoustic variables to the magnitude of intelligibility change. Similar to Searl and Evitts [22], acoustic predictor variables were entered together in a single block as there was no strong theoretical rationale regarding the order of variable entry. Standardized regression coefficients subsequently were used to investigate the independent contribution of each predictor variable (holding constant all other variables) to the magnitude of intelligibility change. An α level of 0.05 was used for all hypothesis testing. For the remainder of the paper, ‘acoustic change' and ‘intelligibility benefit' are used to refer to difference scores for acoustic measures and intelligibility, respectively.

Univariate Linear Regression

Table 1 summarizes the results for all regression analyses. Four of the 6 univariate functions were significant. Acoustic change in tense vowel space area [r2 = 0.33; F(1, 34) = 17.94; p < 0.001], lax λ [r2 = 0.16; F(1, 34) = 7.73; p = 0.009], articulation rate [r2 = 0.38; F(1, 34) = 22.71; p < 0.001] and SPL [r2 = 0.19; F(1, 34) = 9.27; p = 0.004] accounted for between 16 and 38% of the variance in intelligibility benefit. Relationships were in the expected direction, such that greater magnitudes of acoustic change were associated with greater magnitudes of intelligibility benefit. Visual inspection of residual plots suggested all residuals were randomly scattered within 2.5 SD of the estimate of Y, with only a few exceptions. For tense vowel space area and lax λ, outliers were removed and univariate regressions were recomputed. The results were similar to those of the original analyses, but significant functions accounted for a slightly different proportion of the variance (lax λ r2 = 0.10; tense vowel space area r2 = 0.42). In addition, residuals for lax λ showed greater variation around the middle of the scatter plot; however, similar to the other variables, no other systematic pattern was observed.

Figure 1 illustrates the univariate analyses for articulation rate and SPL. The figure is coded for speaker sex (shading) and condition (symbol shape). Numbers inside symbols correspond to individual speakers. Inspection of individual speaker data provides varying levels of support for the overall group finding that greater acoustic change was associated with greater intelligibility benefit. For the majority of speakers, greater increases in tense vowel space area tended to be associated with a greater intelligibility benefit. Similarly, figure 1 illustrates that for the majority of the 12 speakers, greater reductions in articulation rate were associated with greater improvements in sentence intelligibility. Individual speakers less consistently followed the overall group trends for lax λ and SPL. For example, figure 1b suggests that for speakers 3, 5 and 12, greater changes in SPL were associated with greater intelligibility benefit, but this trend did not hold or was less obvious for other speakers.

Correlation Analysis

The majority of pairwise correlations for acoustic variables were significant, with Pearson's r values ranging from 0.35 to 0.78 (two-tailed tests: p < 0.05). Most correlations involving tense λ were not significant. Articulation rate and SPL also were not significantly correlated. Moderate-to-strong correlations between many of the acoustic variables could complicate the interpretation of univariate regressions. Thus, multivariate regression analysis also was used to investigate the independent contribution of each variable, holding constant all others.

Multivariate Regression

The multivariate regression model with 6 predictor variables (tense vowel space area, lax vowel space area, tense λ, lax λ, articulation rate and SPL) was significant [adjusted R2 = 0.55; F(6, 29) = 8.2; p < 0.05]. The standardized partial regression coefficients reported in table 1 indicate that 3 predictor variables accounted for a statistically significant portion of the variance in intelligibility above and beyond all other variables. Controlling for all other predictor variables, lax vowel space area (β = -0.407; p < 0.05), articulation rate (β = -0.513; p < 0.05) and SPL (β = 0.370; p < 0.05) accounted for 17, 26 and 14% of the variance in intelligibility benefit, respectively. The direction of the relationship between predictor variables and intelligibility change was such that greater reductions in rate as well as greater increases in SPL were associated with greater increases in intelligibility. However, a greater increase in lax vowel space area (holding constant all other variables) was associated with a greater decrease in intelligibility. Although tense vowel space accounted for 20% of the variance in intelligibility (holding constant all other variables), table 1 notes that results only approached significance (p = 0.056). Variance inflation factors further indicated that all predictors had a variance inflation factor of <10 [23], ensuring that predictors did not violate the assumption of multicollinearity and that a multiple regression approach was appropriate.

The overarching aim of the present study was to determine the extent to which greater magnitudes of acoustic change accompanying 3 clear speech instructions (i.e. clear, hearing impaired, overenunciate) were associated with greater magnitudes of change in sentence intelligibility. Results indicated that only some of the acoustic changes accompanying a clear speech style were associated with improvements in intelligibility.

For the univariate regression analyses, greater increases in tense vowel space area, greater dynamic spectral change for lax vowels as well as greater reduction in articulation rate and greater increases in SPL were found to be significant predictors of greater improvements in sentence intelligibility (table 1). These results are in broad agreement with other studies using linear regression analysis to study the acoustic basis of the clear speech benefit [1,2]. Results from the multivariate regression analysis further indicated that the 6 acoustic predictors, as a group, explained 55% of the variance in the clear speech benefit. Interestingly, when all predictors were accounted for in the multivariate model, articulation rate (r2 = 0.26) remained the strongest predictor of intelligibility. Above and beyond all other variables, SPL also accounted for a small percentage (14%) of variation in intelligibility. However, only a few individual speakers followed the group trend for greater increase in SPL to be associated with greater intelligibility increase. As indicated in figure 1, a wide range of intelligibility change was further observed for a relatively small change in SPL (i.e. between -1 and +1 dB). Thus, readers should interpret the SPL results with caution. Furthermore, because sentences were equated for overall root mean square amplitude prior to mixing with multitalker babble, the finding that a relatively greater increase in SPL was associated with a greater improvement in intelligibility is not attributable to improvements in audibility. Rather, the relationship between SPL and intelligibility may reflect changes in spectral tilt related to adjustments in vocal quality or segmental articulatory behavior [5,24].

The fact that articulation rate was significantly correlated with the majority of vowel spectral metrics and yet remained the largest predictor of intelligibility in the multivariate model, suggests that for a clear speaking style, utterance-level changes (i.e. global speech timing) might also capture segment-level clear speech adjustments. In this manner, adjustments in speech timing might serve as a gross indicator of the clear speech benefit. We do not suggest that clear speech instructions should target this particular variable (i.e. speak slower). Rather, measures of articulation rate might be used to predict improvements in intelligibility for clear speech strategies that use similar variants of instruction explored in the current study.

The multivariate analysis further indicated that, holding all other predictors constant, a decrease in the size of the lax vowel space area was associated with improved intelligibility. This unexpected finding could be an artifact of the statistical modeling. In the current study, a combination of 6 acoustic predictors was studied. Model results would likely differ depending on the specific set of predictor variables and their grouping or order of entry into the model. Alternatively, the nature of the relationship between lax vowel space area change and intelligibility change in the multivariate analysis may be related to improved vowel contrast for tense and lax vowels in so far as a relatively smaller increase in lax vowel space would effectively enhance spectral distance or contrast for tense and lax vowels. However, in our previous acoustic study [7], spectral distinctiveness for tense-lax vowel pairs did not differ significantly among conditions. Therefore, the current results may indicate that lax vowel space area may be tapping into vowel spectral distinctiveness in a different and important way. Nonetheless, the multivariate analysis only included a limited number of acoustic variables and results would likely differ with the addition of other variables (i.e. F₀ or segmental contrast for consonants). Therefore, the acoustic basis of the improved intelligibility of clear speech requires further investigation. Studies employing speech resynthesis are likely to be helpful in this regard, as resynthesis allows for parametric control over specific acoustic parameters in a way that is not possible in behavioral studies.

Results from the univariate and multivariate analyses differed for tense vowel space area and lax λ. Thus, while greater rates of spectral change and tense vowel space area may accompany a clear speech style, univariate analyses based on a single type of segmental vowel measure might not represent the interaction among all vowel characteristics (i.e. static measures, dynamic measures and measures of contrastivity) and their association with intelligibility [25]. These differences demonstrate the value of a multivariate approach for investigating the acoustic basis of intelligibility in addition to the more traditional univariate approach.

Finally, it is notable that speakers in the present study achieved the same magnitude of clear speech intelligibility benefit in different ways. For example, speakers 5 and 8 had similar ‘habitual' intelligibility (i.e. 40 and 45%), and both speakers also improved intelligibility by approximately 40 percentage points in the overenunciate condition. However, inspection of individual speaker data indicates that these speakers manipulated acoustic parameters differently to achieve a similar clear speech benefit. Shown in figure 1, for example, the triangles representing the overenunciate condition illustrate that speaker 5 increased SPL by 3 dB, but that there was no change in SPL for speaker 8. Furthermore, articulation rates in the overenunciate condition were reduced by a greater magnitude for speaker 8 (-2.7 syllables per second) when compared with speaker 5 (-1.7 syllables per second). Similar patterns were observed upon inspection of data on tense vowel space area and lax λ, where speaker 8 maximized segmental adjustments and speaker 5 increased dynamic spectral characteristics to achieve greater intelligibility benefits. The findings are consistent with talker variation in previous studies of the clear speech benefit [3]. Thus, it remains to be determined whether therapeutic applications of a clear speech style can leverage this type of individual speaker variability to enhance the efficiency of training.

The current study examined neurologically normal speech, and generalization of findings to clinical populations would be premature. Nonetheless, this line of inquiry may assist in enhancing the therapeutic use of clear speech strategies for clinical populations such as dysarthria patients or communication partners of individuals with a hearing impairment. The current findings strongly suggest that optimal therapeutic use of clear speech for such populations will require studies that identify clear speech cues or instructions that are most effective in maximizing intelligibility. Quantifying changes in the acoustic signal that might capture or represent the perceptual construct of intelligibility further has implications for documenting therapeutic success as well as for enhancing the scientific evidence base for using clear speech therapeutically.

Research was supported by grant No. R01DC004689.

1.
Smiljanić R, Bradlow AR: Speaking and hearing clearly: talker and listener factors in speaking style changes. Lang Linguist Compass 2009;3:236-264.
2.
Uchanski RM: Clear speech; in Pisoni DB, Remez RE (eds): The Handbook of Speech Perception. Malden, Blackwell, 2005, pp 207-235.
3.
Ferguson SH, Kewley-Port D: Talker differences in clear and conversational speech: acoustic characteristics of vowels. J Speech Lang Hear Res 2007;50:1241-1255.
4.
Kain A, Amano-Kusumoto A, Hosom JP: Hybridizing conversational and clear speech to determine the degree of contribution of acoustic features to intelligibility. J Acoust Soc Am 2008;124:2308-2319.
5.
Krause JC, Braida LD: Acoustic properties of naturally produced clear speech at normal speaking rates. J Acoust Soc Am 2004;115:362-378.
6.
Krause JC, Braida LD: Evaluating the role of spectral and envelope characteristics in the intelligibility advantage of clear speech. J Acoust Soc Am 2009;125:3346-3357.
7.
Lam J, Tjaden K, Wilding G: Acoustics of clear speech: effect of instruction. J Speech Lang Hear Res 2012;5:1807-1821.
8.
Lam J, Tjaden K: Intelligibility of clear speech: effect of instruction. J Speech Hear Res 2013, E-pub ahead of print.
9.
American National Standards Institute: American National Standard Specification for Audiometers S3.6-2004.
10.
Milenkovic P: TF32 (computer software). Madison, Department of Electrical and Computer Engineering, University of Wisconsin-Madison, 2002.
11.
Yorkston KM, Beukelman DR: Assessment of Intelligibility of Dysarthric Speech. Austin, Proed, 1981.
12.
Ferguson SH: Talker differences in clear and conversational speech: vowel intelligibility for normal-hearing listeners. J Acoust Soc Am 2004;116:2365-2373.
13.
Harnsberger JD, Wright R, Pisoni DB: A new method for eliciting three speaking styles in the laboratory. Speech Commun 2008;50:323-336.
14.
Bradlow AR, Bent T: The clear speech effect for non-native listeners. J Acoust Soc Am 2002;112:272-284.
15.
Bradlow AR, Kraus N, Hayes E: Speaking clearly for children with learning disabilities: sentence perception in noise. J Speech Lang Hear Res 2003;46:80-97.
16.
Dromey C: Articulatory kinematics in patients with Parkinson disease using different speech treatment approaches. J Med Speech Lang Pathol 2000;8:155-161.
17.
Moon S, Lindblom B: Interactions between duration, context, and speaking style in English stressed vowels. J Acoust Soc Am 1994;96:40-55.
18.
Feijoo S, Fernandez S, Balsa R: Acoustic and perceptual study of phonetic integration in Spanish voiceless stops. Speech Commun 1999;27:1-18.
19.
Yunusova Y, Weismer G, Kent RD, Rusche NM: Breath-group intelligibility in dysarthria: characteristics and underlying correlates. J Speech Hear Res 2005;48:1294-1310.
20.
Weismer G, Yunusova Y, Bunton K: Measures to evaluate the effects of DBS on speech production. J Neurolinguist 2012;25:74-94.
21.
Yorkston KM, Beukelman DR, Tice R: Sentence Intelligibility Test. Lincoln, Tice Technologies, 1996.
22.
Searl J, Evitts P: Tongue-palate contact pressure, oral air pressure and acoustics of clear speech. J Speech Lang Hear Res 2013;56:826-839.
23.
Mason CH, Perreault WD: Collinearity, power, and interpretation of multiple-regression analysis. J Marketing Res 1991;28:268-280.
24.
Neel AT: Effects of loud and amplified speech on sentence and word intelligibility in Parkinson disease. J Speech Hear Res 2009;52:1021-1033.
25.
Kim H, Hasegawa-Johnson M, Perlman A: Vowel contrast and speech intelligibility in dysarthria. Folia Phoniatr Logop 2010;63:187-194.

The results were presented at the 2012 Biennial Conference on Motor Speech, Santa Rosa, Calif., USA.

Copyright / Drug Dosage / Disclaimer
Copyright: All rights reserved. No part of this publication may be translated into other languages, reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying, recording, microcopying, or by any information storage and retrieval system, without permission in writing from the publisher.
Drug Dosage: The authors and the publisher have exerted every effort to ensure that drug selection and dosage set forth in this text are in accord with current recommendations and practice at the time of publication. However, in view of ongoing research, changes in government regulations, and the constant flow of information relating to drug therapy and drug reactions, the reader is urged to check the package insert for each drug for any changes in indications and dosage and for added warnings and precautions. This is particularly important when the recommended agent is a new and/or infrequently employed drug.
Disclaimer: The statements, opinions and data contained in this publication are solely those of the individual authors and contributors and not of the publishers and the editor(s). The appearance of advertisements or/and product references in the publication is not a warranty, endorsement, or approval of the products or services advertised or of their effectiveness, quality or safety. The publisher and the editor(s) disclaim responsibility for any injury to persons or property resulting from any ideas, methods, instructions or products referred to in the content or advertisements.