Abstract
Introduction: Abnormal facial growth is a recognized outcome in cleft lip and palate (CLP), resulting in a concave profile and a class III occlusal status. Maxillary osteotomy (MO) is undertaken to correct this facial deformity, and the surgery can impact speech articulation, although the evidence remains limited and ill-defined for the CLP population. Aims: The aim of the study was to investigate the impact of MO on the production of the fricatives /f/ and /s/, using perceptual and acoustic analyses, and to explore the nature of speech changes. Methods: Twenty participants with CLP were seen 0–3 months pre-operatively (T1) and 3 months (T2) and 12 months (T3) after MO. A normal group (N = 20) was similarly recruited. Perceptual speech data was collected according to a validated framework and ratings made on audio and audio-video recordings (VIDRat). Spectral moments were centre of gravity (CG), standard deviation (SD), skewness (SK) and kurtosis (KU). Reliability studies were carried out for all speech analyses. Results: For the CLP group, VIDRat identified dentalization/interdentalization as the main type of pre-operative error for /s/ with a statistically significant improvement over time, χ2(2) = 6.889, p = 0.032. Effect sizes were medium between T1 and T3 (d = 0.631) and small between T2 and T3 (d = 0.194). For the acoustic data, effect sizes were similarly medium between T1 and T2 (e.g., SK, /f/ d = 0.579, /s/ d = 0.642) and small between T1 and T3 across all acoustic parameters. Independent t tests showed mainly statistically significant differences between both groups at all time points with large effect sizes (e.g., T2 CG, t = –4.571, p < 0.001, d =1.581), indicating that /s/ was not normalized post-operatively. For /f/, differences tended to be at T1 with large effect sizes (e.g., CG, t = –2.307, p = 0.028, d = 0.797), reflecting normalization. Conclusions and Implications: This is the first speech acoustic study on /f/ for individuals with CLP undergoing MO. The surgery has a positive impact on /f/ and /s/, which appear to stabilize 3 months post-operatively. Speech changes are an automatic and a direct consequence of the physical changes brought about by MO, effecting articulatory re-organization. The results of the study have direct clinical implications for the clinical care pathway for patients with CLP undergoing MO.
Introduction
Abnormal facial growth is a recognized outcome in cleft lip and palate (CLP), resulting in a typical concave facial profile and class III occlusal status [1, 2]. The causes of abnormal facial growth in CLP include intrinsic developmental deficiencies or are iatrogenic as a result of previous palate repair [3, 4]. Approximately 25–50% of individuals with abnormal facial growth in CLP will require orthognathic surgery for both functional and aesthetic purposes [5, 6]. Maxillary retrusion can also occur in individuals without CLP, with an estimate of 5% of the population requiring surgical intervention to correct this [7]. Surgery is typically undertaken during late adolescence to early adulthood when active facial development is completed. This is a time when self-image, self-esteem, social network building and sexual identity are important to the individual [8, 9]. The typical surgical technique is maxillary osteotomy (MO), also known as a Le Fort I osteotomy, which can be undertaken with or without a mandibular setback [10-12]. The surgery is multidimensional in nature with horizontal, vertical and transverse movements involving advancement, retrusion, elongation and shortening of the maxilla [10, 11]. Several complications following MO have been documented in the literature, such as hearing disturbances, gingival retraction and avascular necrosis [11]. The surgery can also impact adversely on velopharyngeal function, particularly in individuals with a history of CLP [12-17]. Velopharyngeal dysfunction is when the palate does not close efficiently or consistently during speech, resulting in hypernasality, nasal airflow errors and weakened or nasalised consonants [18, 19]. There is, however, evidence suggesting that MO can have a positive impact on speech articulation [17, 20, 21].
Dental anomalies and skeletal malocclusions can have a detrimental effect on speech articulation [22-24], although this is much less studied. The majority of perceptual (and acoustic) speech studies tend to focus on the non-CLP population with the most commonly reported pre-operative speech errors being with sibilants /s, z, ʃ, tʃ, ʤ/, alveolar plosives /t, d/, and labiodentals, e.g., /f/ [21, 24-27]. Pre-operative errors for sibilants were both visual and auditory and defined as frontal distortions by Vallino and Tompson [24], where the tongue is placed “too far distally to the mandibular incisors while the tongue body is flattened” [24] or when the sound is produced with the tongue tip protruding between the upper and lower teeth. Tip alveolar sounds such as /t, d/ tend to be dentalized or interdentalized, resulting in visual distortion type errors [21, 24, 25]. The pattern appears to extend to languages other than English. For instance, Whitehill et al. [27] reported similar errors with /s/ and the affricates /ts/ and /tsh/ in Cantonese-speakers with dentofacial anomalies. There is also evidence of the vulnerability of /f/, which was found to be produced as a reversed labiodental [28], a bilabial fricative [θ], or produced in a weak manner [25, 27, 28]. Whitehill et al. [27] also identified “whistling” of /f/, which is created by the air passing between the tongue and alveolar ridge. A systematic review of the osteotomy literature found evidence suggesting a positive impact of MO on articulation, observed immediately post-operatively and maintained a year after surgery [29]. Errors with bilabials, tip-alveolars and labiodentals were reported to be eliminated post-operatively [17, 30], with improvement in jaw relationship brought about by the surgery. However, Ip [25] found that the labiodental /f/ was the phoneme most resistant to the effects of MO in her study on Cantonese-speakers. For the English phonemes /s/ and /z/, although the number of errors was reduced post-operatively, distortions remained [20, 24, 25, 30-32].
There are also language-specific studies that have used instrumental measures, such as sound spectrography [21, 26, 33-36] (Table 1), to explore the nature of the impact of MO on speech in both CLP and non-CLP individuals. Perceptual analyses of speech of individuals with cleft palate, or that of speakers with a malocclusion, can be challenging to transcribe reliably or accurately [37]. Instrumental measures not only provide additional information, but also allow the capture of subtle changes in speech that might not be perceived from perceptual analyses. Acoustic analysis of speech using sound spectrography “affords quantitative analyses that carry potential for determining the correlates of perceptual judgments” [38]. In the production of /s/, the typical point of constriction is made between the tongue tip and alveolar ridge. In individuals with a class III occlusal status, the point of constriction shifts in relation to anatomical structure and position. The tongue tip, blade or sometimes even the front of the tongue body is placed against the upper central incisors, reflecting a shortening of the vocal tract as well as a decreased front resonating cavity, thereby negatively impacting on the individual’s ability to produce /s/ “normally.” Acoustically, this would potentially be reflected in a lower centre of gravity (CG), positive value for skewness (SK), an increased standard deviation (SD) value and a negative value for kurtosis (KU), spectral moments that have been found to define the /s/ spectrum well [20, 21].
Several studies have shown improvements in acoustic parameters of production of /s/ [21, 26, 33, 34, 36]. For example, Hagberg et al. [33] measured the energy distribution of /s/ on low, mid and high frequency bands and reported a statistically significant positive correlation between the change in spectral balance of /s/ and the overall increased percentage of “oral consonant correct” measured from perceptual ratings. In contrast, other acoustic studies [26, 35] reported a regression back to pre-operative status, which the authors attributed to the phenomenon of articulatory re-organization. An additional plausible reason could be skeletal relapse within their non-CLP cohort [26], although skeletal relapse has been shown to be more likely in individuals with CLP due to tethering of an existing pharyngeal flap and/or scar retraction [11].
At present, therefore, the available evidence is mixed and continues to be ill-defined. Interestingly, none of the acoustic studies were based on English-speakers, with the exception of our preliminary data [34]. Additionally, although Hagberg et al. [33] compared perceptual with acoustic findings, their study was based on retrospective data and did not further investigate if articulation was normalized. There is also no up-to-date instrumental evidence on fricatives other than /s/ (except for the study by Yamamoto et al. [36], which included /ʃ/) in a clinical field where perceptual speech studies have identified /f/ to be vulnerable in the osteotomy population. This study, therefore, sets out to investigate the impact of MO on two vulnerable phonemes, /s/ and /f/, in individuals with CLP. The study is set within a larger speech osteotomy study [21]. Early acoustic findings on /s/ (N = 10) were reported previously [34]. Here, we report on the impact of MO on both /s/ and /f/ for the full cohort over time, using perceptual and acoustic analyses. The inclusion of two post-operative time points (3 and 12 months) allowed us to explore early and late (permanent) speech changes post-operatively. The inclusion of a normal control group enabled us to investigate if speech was normalized following surgery.
Materials and Methods
Participants
There were two groups of participants. Group 1 (N = 20) consisted of a consecutive series of individuals with CLP and a class III malocclusion undergoing MO by a single maxillofacial surgeon, within a single regional Cleft service in the UK. Twenty participants were recruited in the original study. All participants were British English-speakers [21]. Acoustic data were available for 17 of the 20 participants. There were 14 males and 3 females between the ages of 18;1 and 30;1 years (mean = 20;4, SD = 2;10). Group 2 (N = 20) represented the normal group. Participants in group 2 were recruited through colleagues within the department and staff of the hospital. There were 10 females and 10 males, aged between 19;8 and 26;0 years (mean = 23;3, SD = 2;11). None of the participants presented with any malocclusion, nor were any dental abnormalities noted or reported. None of the participants in both groups reported any hearing or learning difficulties that could impact on test performance. All participants were native British English-speakers.
Assessment Time Points
Speech samples were collected at 3 assessment time points for participants in the CLP group: 0–3 months pre-operatively (T1) and 3 months (T2) and 12 months (T3) post-operatively. Participants in the normal group were seen at a single time point. A 12-month post-operative time point was identified as this is when the maxilla is considered to be stable and speech changes are permanent [21]. Two participants missed their T2 assessment time points. Three participants requested a shortened version of the stimuli set for the acoustic assessment which targeted /s/ only. Acoustic speech assessment was not part of clinical routine, and the three participants did not want an extended assessment session.
Perceptual Speech Assessment
Speech Stimuli and Sampling
All speech data were collected in the original study [21] using a standardized speech protocol, the Cleft-Audit Protocol for Speech-Augmented (CAPS-A), a validated and reliable tool used by CLP centres in the UK and Ireland for audit purposes [39]. Speech sampling for articulation was repetition of standardized sentences from the CAPS-A, e.g., “I saw Sam sitting on a bus,” “The phone fell off the shelf.” Data were audio and video recorded using a digital camera and a stand-alone hypercardioid condenser microphone (RODE NT3) and subsequently digitized and converted into mpeg format. The digitized data was then edited and randomized to ensure blinding of participants and time points for the ratings studies. The audio component was subsequently extracted from the digitized data using RER Audio Converter 3.7.5.0412 (A–S) and converted to wav format (uncompressed CD Audio Quality) for the audio only ratings study.
Ratings Studies
Two ratings studies were carried out: an audio only study (AUDRat) for both /s/ and /f/, and an audio-video study (VIDRat) for /s/. In the original osteotomy study, only the voiceless phonemes /t/, /s/ and /ʃ/ were identified for visual ratings to capture “frontal distortions” [34]. Ratings were made independently by two specialist speech and language therapists in CLP with over 10 and 20 years’ experience, respectively, both of whom were CAPS-A trained. The raters were also unaware of the study objectives, the identification of the participants and the time points at which the speech samples were recorded. For the AUDRat study, articulation data was coded on an ordinal scale. There are 4 speech categories in the CAPS-A: anterior oral Cleft Speech Characteristics (CSCs), posterior oral CSCs, non-oral CSCs, and passive CSCs (Table 2). For each speech characteristic within a CAPS-A category, a score of 0 was assigned if no consonants were affected, 1 if ≤2 consonants were affected, and 2 if 3 or more consonants were affected. For the VIDRat study, 3 scalar points were used to reflect the visual aspects of articulation: 0 to indicate normal production/no error, 1 to indicate “dentalization,” 2 to indicate “interdentalization” and 3 to indicate “lateralization/lateral articulation or palatalization/palatal articulation.” Any other error, e.g., backing to velar, was coded as 8 and was not used in the final analyses.
Acoustic Speech Assessment
Speech Stimuli and Sampling
As the perceptual data was collected using the CAPS-A, a set of speech stimuli was specially created for the acoustic part of the study. This was to control for the phonetic environment and enable collection of at least 5 trials for each target. The stimuli were all set within a C-V-C structure. Initial consonants included a range of phonemes: plosives (/p, b, t, d, k, ɡ/), fricatives (/f, v, s, z, ʃ, ʒ, h/) and approximants (/l, r, j, w, dʒ/). The final consonant was a voiceless pressure consonant wherever possible. The 4 peripheral vowels /a, æ, i, u/ were included in the stimuli set for another strand of the study. Each word stimulus was set within a standard carrier phrase: “Say [target] once more.” The target stimuli for /s/ were “seat,” “soup,” “sat” and “sarge” and “feet,” “foot,” “fat” and “fart” for /f/. The entire word list was repeated 5 times and randomized for each participant and at each time point [21]. Each participant was asked to read aloud the randomized lists, and speech was recorded using a digital recorder and a Sony electret condenser microphone. All recordings were subsequently digitized at a sampling frequency of 22,050 Hz (16-bit resolution) and saved as wav files.
Annotation
Target stimuli were extracted from the wav files and annotated using Praat: doing phonetics by computer [40]. Annotation of /s/ targets was undertaken in the original study and annotated by the last author, and annotation of /f/ targets was undertaken by the first author and a second Masters’ research student using Praat version 6.1.07 [40]. All annotation was made based on a standardized annotation procedure. The onset and offset boundaries of /s/ and /f/ were placed manually based on the annotation procedure as well as visual inspection of the spectrograms (fricative noise) and waveforms. The onset boundary was marked at the point where the waveform stopped being quasi-periodic, and the offset boundary was marked at the point where the waveform started being quasi-periodic or where the first obvious quasi-periodic waveform of the vowel was identified (Fig. 1).
Spectral Moments
Speech research has shown that there are distinct spectral patterns that correspond to the phonetic features of place and manner of articulation, which can be derived from the acoustic waveforms [41, 42]. Spectral moments, such as CG, SD, SK and KU, are well-defined for the non-sibilant fricative /f/ and sibilant fricative /s/ and can be mapped onto phonetic features of these fricatives [41, 42] (Table 3). For example, CG indicates the length of the vocal tract and front resonating cavity. The values of the spectral moments for both fricatives were extracted from the sound files using a specially written script in Praat.
Reliability
Perceptual Speech Data
Inter-rater reliability was established in the original study [21] using Spearman’s correlation (rs) and percent agreement, which was calculated using two statistics: perfect agreement (Po) and a less conservative agreement based on whether the two raters agree to the precision of –1 to +1 scores (Po-1) [21]. For the AUDRat study, inter-rater reliability for the 4 speech categories in CAPS-A ranged from rs = 0.543 to 1.00 (all p < 0.01), reflecting large correlations for all parameters and Po ranged from 91.7% to 98.3% with Po-1 ranging from 98.3% to 100% reflecting acceptable percent agreement. For the VIDRat study, inter-rater reliability for /s/ was rs = 0.643 (p < 0.01), reflecting a large correlation and acceptable percent agreement with Po at 80.0% and Po-1 at 95%. Intra-rater reliability was not evaluated as this would have been established during the formalised CAPS-A training where ratings on 10 speech recordings based on the CAPS-A are repeated at least a month later [43].
Acoustic Speech Data
Inter- and intra-rater reliability for /s/ was established in a previous study using Pearson’s correlation, where annotation was made by the last author and where the third author served as the inter-rater reliability check [21]. Intra-rater reliability was r = 0.980 (p < 0.001) and inter-rater reliability was r = 0.998 (p < 0.001), reflecting excellent reliability [21]. As the additional annotation and analyses of /f/ were undertaken for the purposes of the current study, the calculated reliability was made on annotations made by the first author and her peer as described above. For /f/ data, intra-rater reliability was r = 0.802 (p < 0.001) and inter-rater reliability was r = 0.956 (p < 0.05), reflecting excellent reliability.
Exploring Assumptions
The assumption of normality of distribution was checked graphically using histograms and P-P plots and statistically using the Shapiro-Wilk test. The assumption of homogeneity of variance for within-subject comparisons was checked using the Mauchly’s test of sphericity. If the p value was significant (p ≤ 0.05), the Greenhouse-Geisser correction was applied. The assumption of homogeneity of variance for between-group comparisons was tested using Levene’s test for equality of variances. If the p value was significant (p ≤ 0.05), the p value for “equal variances not assumed” was reported. Outliers were identified using boxplots for all data at each time point. True outliers were identified as scores –3 SD or +3 SD from the group mean. Correction of the data was made using the mean +2 SD method [44]. There were only 12 data points out of a possible total of 568 (or 0.02%) for both groups across the time points that were identified as true outliers.
Statistical Analysis
Friedman’s test (using a χ2 distribution) was used for the perceptual data to examine within-group differences over time and the Wilcoxon signed rank test to look at differences across pairs of time points. For the acoustic data, general linear measure repeated measures (one-way ANOVA) were undertaken to examine within-group differences for each spectral moment (CG, SD, SK and KU) with time at 3 levels (T1, T2 and T3) for /f/ and /s/. Effect sizes (Cohen’s d) were calculated to assess the strength of the association of spectral moments across pairs of time points (T1–T2, T1–T3, T2–T3). Independent t tests were used to evaluate between-group comparisons (CLP group and the normal group for each spectral moment at T3 (12 months post-operatively), which is when there should be no more skeletal relapse, if any, and so, speech changes seen at this time point are taken to be stable and permanent. Correlational analyses were also undertaken to look at comparisons between perceptual and acoustic speech data.
Results
Perceptual Data
For the CLP group, only a small number of articulation errors were noted for /s/ in the AUDRat study. Case 10 presented with lateralization at all time points and case 1 presented a palatal fricative, but only at T2. There were no other types of errors with /s/. Dentalization or interdentalization was not detected in any participant. For /f/, no errors were noted. As only a small number of errors were noted from AUDRat, statistical analyses showed no significant differences over time for the CLP group and also no differences between the CLP group and the normal group.
For the VIDRat study, data was available for 19 (of 20) cases. Ten (or 53%) of the CLP group presented with a normal production of /s/, which increased to 14 at T2 and 16 at T3. The most common error was dentalization (8 cases at T1, 2 at T2 and 3 at T3). This positive change was statistically significant: χ2(2) = 6.889, p = 0.032. Palatalization/lateralization was seen in 1 case, which showed no change at T2. There was a significant difference between T1 and T3, W = –2.309, p = 0.021, but not between T1 and T2 or between T2 and T3. Effect sizes provide further evidence of this: medium between T1 and T3 (d = 0.631) and small between T2 and T3 (d = 0.194). When compared with the normal group, there was only a statistical difference at T1 (U = 50.000, p = 0.011). These results suggest that the surgery has a spontaneous, positive and automatic effect on /s/ and is stable and permanent immediately following surgery. The results also suggest that /s/ is normalised post-operatively, based on perceptual ratings.
Acoustic Data
Descriptive Statistics
/s/. For the CLP group, CG increased from a mean of 5,229.843 at T1 to 5,618.871 at T2 and 5,850.401 at T3 in comparison with the mean for the normal group, which was a high of 6,848.653. SD values decreased from a high of 1,977.216 at T1 to 1,942.805 at T2 and a further decrease to 1,859.523 at T3, contrasting with a lower SD mean value of 1,511.874 for the normal group. In terms of SK, this changed from a positive mean of 0.151 at T1 to a negative value of –0.243 at T2 and –0.259 at T3. The SK mean for the normal group was –0.526. KU values decreased from 1.144 at T1 to 0.697 at T2 and 0.891 at T3. Means and SDs are shown in Table 4.
/f/. For the CLP group, CG increased from a mean of 3,822.965 at T1 to 4,126.359 at T2 and 4,396.786 at T3 in comparison with the mean for the normal group which was 4,733.597. As seen for /s/, SD values increased from 2,217.988 at T1 to 2,526.516 at T2 and 2,390.308 at T3. The mean for the normal group was 2,536.812. Mean values for SK decreased from 1.346 at T1 to 0.758 at T2 and 0.796 at T3, contrasting with the mean value for the normal group, which was lower at 0.474. KU values also decreased from 3.753 at T1 to 0.567 at T2 and 1.56 at T3 in contrast to a negative value of –0.327 for the normal group. Means and SDs are shown in Table 4.
Within-Group Differences over Time
For fricative /s/, there was no main effect of time for any of the spectral moments. However, medium effect sizes were found between pre- and post-operative time points for SK (T1–T2 d = 0.642, T1–T3 d = 0.652) and an almost medium effect size for CG (T1–T3 d = 0.434), suggesting a true clinical impact of MO on /s/. In contrast, small effect sizes were found across all spectral moments between T2 and T3, suggesting that speech changes seen were already stable and permanent at 3 months post-operatively for /s/. For /f/, there was also no main effect of time for any of the spectral moments. Effect sizes were small for all spectral moments between T1 and T2, T1 and T3, and T2 and T3, except for SK between T1 and T2, which was medium (d = 0.579).
Between-Group Differences
For fricative /s/, independent t tests showed significant differences between the CLP group and the normal group at all time points for CG, SD, SK and KU, except for SK at T2 and T3. Effect sizes further supported these findings with almost medium to large effect sizes (d = 0.708–1.581), suggesting that the spectral moments of /s/ for the CLP group do not become normalized even a year after surgery (Fig. 2). For /f/, there were significant differences between the CLP group and the normal group at T1 for CG (p = 0.028) and SK (p = 0.039), which were further supported by the almost large to large effect sizes at T1 in contrast to small to medium effect sizes at T2 and T3. For KU, there was a significant difference at T3 (p = 0.042), with large effect sizes, suggesting that all the spectral moments of /f/, except for KU, were normalized post-operatively (Fig. 3c).
Comparisons between Perceptual and Acoustic Data
Correlational analyses between AUDRat and spectral moments resulted in no meaningful computations due to too few errors noted from perceptual auditory ratings. For VIDRat and spectral moments, there was a moderate-sized correlation (rs = –0.506) between visual errors rated perceptually and SK at T1, which was also statistically significant (p = 0.038). The correlation at T2 was small (rs = 0.211) and not significant, but moderate (rs = 0.447) and almost significant (p = 0.072) at T3. All other correlations between visual ratings and acoustic parameters were small and not statistically significant.
Discussion
Current evidence on the impact of MO on speech articulation for the CLP population is limited and not well-defined. In this study, we report on the impact of MO on the fricatives /f/ and /s/ across two post-operative time points (3 and 12 months post-operatively) based on perceptual audio and audio-video ratings and on 4 spectral moments (CG, SD, SK and KU) using acoustic analyses. This is the first study that has looked at the impact of the maxillary surgery on production of /f/ using acoustic analyses. Perceptual ratings based on audio only recordings found very few articulation errors and no errors with /f/. For /s/, pre-operative dentalization/interdentalization errors were only identified from ratings based on video recordings. Video ratings also found that MO has a positive impact on the articulation of /s/ and that these positive effects were normalized immediately post-operatively and maintained a year after surgery. In contrast, although acoustic data matched in terms of significant improvements over time, the spectral moments for /s/ were never normalized, even a year after surgery. In contrast, certain spectral moments for /f/ were normalized immediately post-operatively and maintained a year after surgery.
Although there was no statistically significant change over time for the acoustic data, our study found medium effect sizes between pre- and post-operative time points. These findings parallel those reported in the literature [26, 33, 35, 36]. Correction of a class III occlusal status facilitates production of /s/ with a more accurate point of constriction with the tongue tip placed at the alveolar ridge or with the tongue tip placed behind the lower incisors and the blade of the tongue against the alveolar ridge [45] due to a lengthening of the vocal tract and an increased front resonating cavity brought about by the surgery. Acoustically, this is reflected in the within-group differences with an increase in CG, a negative value for SK (indicating a post-osteotomy positive tilt with energy concentration in the higher frequencies), a decrease in SD (reflecting less dispersion of energy), and a positive value for KU (reflecting a more well-defined spectrum and well-resolved peaks). The realignment of the maxilla with the mandible results in positive structural changes for speech articulation, particularly to speech sounds that are sensitive to tooth position and jaw alignment such as /s/. The individual is consequently able to achieve a more accurate point of constriction with an increased front resonating cavity for the production of /s/, resulting in an increased stridency. Statistical comparisons between visual ratings and the range of spectral moments indicated a plausible relationship only with SK, suggesting perhaps that this spectral moment is the more valid acoustic correlate of visual type errors with /s/.
A similar significant change post-operatively was indicated by the perceptual speech data based on video recordings. Dentalization/interdentalization was the most frequent type of error noted pre-operatively, something which was not picked up from audio recordings alone. Such errors reflect visual distortions, characterized by Vallino and Tompson [24] as frontal distortions, and may not be sufficiently significant to be perceived auditorily. In fact, very few speech errors were identified by our two raters based on audio recordings alone. Pre-operatively, a single case presented with palatalization of /s/. Similar pre-operative palatalization errors were reported by Lee et al. [26] in two cases as well as “substitution” of /s/ to [ts] in another two cases. In Cantonese-speaking children, production of /s/ as [ts] is typically reflective of the phonological process of affrication of fricatives of the aspirated affricate [tsh] or the non-aspirated affricate [ts] [46]. Post-operatively, we found that /s/ was rated as “normal” in 3 of the 4 cases. In the study by Lee et al. [26], however, visual distortion type errors were not counted as incorrect, hence equivocal comparisons are not possible. In a separate study on Swedish-speaking participants with CLP, Hargberg et al. [33] reported a statistically significant improvement in the production of /s/ following MO. The outcome measure used in the study was “percentage of consonants correct” and no transcription data was reported. One assumes that errors related to visual distortions would, therefore, not have been identified, as transcriptions were based on audio recordings alone. Vallino’s [17] early perceptual study found that pre-operative errors with /s/ were never auditory type errors alone and were of the “frontal distortion type II” errors, supporting our study findings.
Spectral moments for /s/ were also found to be stable and permanent by 3 months post-operatively, as there were no statistically significant changes between the two post-operative time points. In addition, effect sizes were larger between the pre- and post-operative time points compared with those between the 3-month and the 12-month post-operative time points. Similar findings were reported by Wakumoto et al. [35] based on a single case. No significant changes were found between the two post-operative time points of 3 and 6 months, indicating that the changes seen early on at 3 months were maintained. This is in contrast with the study by Lee et al. [26] who reported a regression of spectral values to pre-operative levels. This was attributed in the first instance to negative articulatory re-organization, a return to pre-operative levels or status, although the authors acknowledge that this could have also been due to skeletal relapse. The perceptual speech study by Vallino [17] noted that the greatest change in speech articulation was seen at 3 months, although in her study, speech continued to improve over time up to 12 months post-operatively. The evidence, therefore, suggests that 3 months post-operatively is an optimal and sufficient time point for the assessment of speech articulation following MO.
An additional interesting finding for /s/ was that whilst spectral values improved post-operatively, they never reached normal levels. This contrasts with the findings of Yamamoto et al. [36] who reported normalization of /s/ post-operatively. This difference in findings can be attributed to a range of factors, such as the different types of acoustic parameters, a shorter post-operative follow-up (6 months) and a focus on the non-cleft population undergoing MO. There is no other osteotomy study at present that has reported normalization of /s/ post-operatively in the cleft population from acoustic analyses, in agreement with our study findings. This finding, however, contrasts with the evidence from perceptual ratings. Speech ratings for /s/ based on video recordings found that /s/ did normalize post-operatively, conflicting with findings reported by Vallino [17] and Hagberg et al. [33]. Vallino [17] found that by 12 months, there were still remaining errors with /s/ and /z/ of the frontal distortion type, at a frequency of 20% and 23.5%, respectively, whilst Hagberg et al. [33] reported a post-operative mean correct articulation of 85%, an improvement from a mean of 34%. Although both our study and Vallino’s [17] study used a combination of audio and video recordings, analyses and coding differed. In our study, raters only noted dentalization/interdentalization type errors with /s/. The level of analyses or coding may have been insufficient for capturing the plausible range of visual distortion type errors possible with /s/, or milder distortions. Additionally, we did not provide raters with operational definitions. Vallino [17], in contrast, provided detailed operational definitions for the categories of frontal distortion types I and II, and dentalization. As no transcription data were provided by Hagberg et al. [33], no direct comparisons can be made. Another crucial point to raise here is that audio recordings do not allow for the observation of visual distortions and perhaps also for mild distortions with sibilants, as evidenced in our study. The visual component is indeed “critical for describing errors related to malocclusions, as well as for assessing the effect of treatment” [17].
For fricative /f/, study findings were mixed. Although there was no main effect of time, the spectral moments of CG, SD and SK were normalised immediately after maxillary osteotomy as indicated by the significant or almost significant differences between the two groups pre-operatively but not post-operatively. With the re-alignment of the lower lip and upper teeth, production of /f/ is made with a longer vocal tract as reflected by the higher mean value for CG, as well as energy, which is more dispersed at the point of constriction, reflected in the increased SD values. Interestingly, however, mean values for KU remained significantly different from those of the normal control group, particularly at 12 months. Jongman et al. [42] described /f/ as having a “relatively flat spectrum with no clearly dominating peak in any particular frequency region” [42], which would be reflected by negative values as seen with the normal control group. In contrast, the mean KU values for the cleft group decreased post-operatively but remained positive, suggesting that there was an improvement, but not sufficient to reach normal values. Another point of consideration is that classification has been found to be poor for non-sibilant fricatives in contrast with sibilant fricatives [41]. As this is the first osteotomy study that has looked at the spectral moments of /f/, no comparisons with other available evidence can be made. The evidence for /f/ from perceptual studies is mixed. Whilst some studies reported that errors with /f/ were eliminated following MO [17, 20, 30, 31], Ip [25], who focused on Cantonese-speakers, reported that /f/ was the most resistant phoneme to surgical intervention, suggesting possible cross-linguistic differences in terms of the impact of MO on speech.
The overall findings of this study, therefore, suggest a positive impact of MO on sibilant and non-sibilant fricatives. For both fricatives /f/ and /s/, complete normalization of spectral moments is not achieved. The improvement in fricative production, as measured by acoustic analyses, appears to be an automatic consequence of the physical changes brought about by MO, reflecting minimal or no true compensation or articulatory re-organization. The new place of articulation is adopted immediately after surgery and maintained permanently. For /f/, normalization is perhaps more easily achievable as only the lower lip and upper teeth are involved in production. These articulators are realigned as a direct consequence of MO, resulting in more plausible correct and normalized production of /f/. In contrast, production of /s/ requires precise and well-coordinated movements of the tongue and placement of the articulators within the oral cavity [47]. The clinical and research utility of including the video component in speech ratings for this clinical group and surgical intervention is implicated.
Study Limitations
The current study focused on the impairment level, addressing the domains of Body Functions and Structures of the International Classification of Functioning, Disability and Health (ICF) [48, 49]. It is globally accepted that cleft speech studies should also address communicative participation and the impact of environmental factors [50-52]. Patient views of their speech and impact on their lives can be captured by the CLEFTQ, a validated patient-reported outcome measure tool [53, 54], which is not only aimed at children, but also adolescents and adults with a cleft condition up to the age of 29 years and, therefore, ideal for the osteotomy cohort. There is a speech scale that addresses speech quality/mechanics of speaking, e.g., “I need to speak slowly to be understood,” and a speech distress scale that measures feelings associated with speech difficulties (e.g., frustration) [55]. The tool is included in the International Consortium for Health Outcome Measurement (ICHOM) speech standard set for CLP [56].
Research and Clinical Implications
Sound spectrography analysis enables clinicians and researchers to identify and capture speech articulation changes brought about by MO that cannot be perceived by the human ear alone. In addition, it provides objective clinical (and research) data as to the nature of these changes. The findings of this study, therefore, have potential clinical and research implications. Clinically, an early post-operative speech follow-up time point (3 months) is indicated in contrast to the typical clinical routine of a long 12-month follow-up time point. This has roll-on implications for the speech therapist’s caseload management. Secondly, the evidence also contributes positively to the patient informed consent to surgical intervention, in that more accurate information as to the impact of the surgery on speech articulation (of fricatives) can be provided to those about to undergo MO. Thirdly, the evidence is contra-indicative of speech treatment for the articulation of fricatives pre-operatively. Post-operatively, speech treatment may be indicated for /s/, particularly for those in certain professions, such as singers, news anchors and actors, that require a high standard of accuracy in speech articulation. Sound spectrography analysis also adds to the speech osteotomy evidence base that speech articulation changes are an automatic and a direct consequence of the physical changes brought about by surgery, effecting articulatory re-organization in the individual.
Conclusions
The present study provides evidence supporting the positive impact of MO on speech articulation. Positive speech outcomes were seen for both fricatives, with acoustic data for /f/ reaching normal values post-operatively and those for /s/ showing a trend towards normalization. The changes seen with /f/ and /s/ were also acquired early on at the 3-month post-operative time point and remained stable a year after surgery, thereby having direct implications on clinical practice. The use of both the audio and video components in speech recordings and ratings are clearly indicated as necessary with this client group.
Acknowledgements
The authors would like to acknowledge Mr. Choco Ho for his participation in the acoustic analyses of the data.
Statement of Ethics
The study was conducted ethically in accordance with the World Medical Association Declaration of Helsinki. All participants have given their written informed consent and the study was approved by the institutes’ committees on human research (06NSO8, 2019.495).
Conflict of Interest Statement
The authors have no conflicts of interest to declare.
Funding Sources
Funding was provided by Sport Aiding Medical Research for Kids (SPARKS) UK, which was awarded to Dr Debbie Sell. The funding body had no part in the preparation of the data or the manuscript.
Author Contributions
Joy M.K. Tsang contributed to the design of the work and analysis and interpretation of the data, drafting and revising the manuscript and approving the final version of the manuscript, and is accountable for aspects of the work in the event that questions are raised.
Wilson S. Yu contributed to the design of the work and analysis and interpretation of the data, drafting and revising the manuscript and approving the final version of the manuscript, and is accountable for aspects of the work in the event that questions are raised.
Jyrki Tuomainen contributed to the conception, design of the work, analysis and interpretation of the data, drafting and revising the manuscript and approving the final version of the manuscript and will participate in accounting for aspects of the work in the event that questions are raised.
Debbie Sell contributed to the conception, design of the work, analysis and interpretation of the data, drafting and revising the manuscript and approving the final version of the manuscript and will participate in accounting for aspects of the work in the event that questions are raised.
Kathy Y.S. Lee contributed to the interpretation of the data, revising the manuscript and approving the final version of the manuscript and will participate in accounting for aspects of the work in the event that questions are raised.
Michael C.F. Tong contributed to the interpretation of the data, revising the manuscript and approving the final version of the manuscript and will participate in accounting for aspects of the work in the event that questions are raised.
Valerie J. Pereira contributed to the conception, design of the work, acquisition of data, analysis and interpretation of the data, drafting and revising the manuscript and approving the final version of the manuscript and is accountable for aspects of the work in the event that questions are raised.
Data Availability Statement
All relevant data generated or analysed during this study are included in this article. Further enquiries can be directed to the corresponding author.