Abstract
Objective: This study investigated whether adding an additional modality, namely ultrasound tongue imaging (UTI), to perception-based phonetic transcription impacted on the identification of compensatory articulations and on interrater reliability. Patients and Methods: Thirty-nine English-speaking children aged 3–12 years with cleft lip and palate (CLP) were recorded producing repetitions of /aCa/ for all places of articulation with simultaneous audio recording and probe-stabilized ultrasound (US). Three types of transcriptions were performed: (1) descriptive observations from the live US by the clinician recording the data, (2) US-aided transcription (UA) by two US-trained clinicians, and (3) traditional phonetic transcription by two CLP specialists from audio recording. We compared the number of consonants identified as in error by each transcriber and then classified errors into eight different subcategories. Results: Both UA and traditional transcription yielded similar error detection rates; however, these were significantly higher than the observations recorded live in the clinic. Interrater reliability for the US transcribers was substantial (κ = 0.65) compared to moderate (κ = 0.47) for the traditional transcribers. US transcribers were more likely to identify covert errors such as double articulations and retroflexion than the audio-only transcribers. Conclusion: UTI is a useful complement to traditional phonetic transcription for CLP speech.
Introduction
Transcribing speech is a key step in the decision-making process for children with speech sound disorders (SSDs) [1]. A full and accurate phonetic transcription forms the foundation for differential diagnosis, treatment choices, measuring outcomes, and ultimately treatment success. However, phonetic transcription is known to suffer from issues with reliability, especially when the SSD is particularly severe and/or when the errors produced by the child involve phonetic distortions which result in speech sounds which are not expected with the phonological system of the target language [2]. For example, English-speaking transcribers find it particularly difficult to identify pharyngeal articulations [3]. Both severe SSD and these types of non-native productions can occur due to cleft lip and palate (CLP), which can make the speech of this group of children particularly challenging to transcribe [4, 5].
Cleft lip and/or palate are the most common congenital craniofacial abnormalities, occurring in 1 in every 700 births [6], and problems with producing clear, intelligible speech can occur in CLP, even after successful surgery. Speakers with CLP often have difficulty achieving adequate velopharyngeal closure, leading to hypernasality and nasal air emission. This in turn leads to a difficulty achieving adequate oral air pressure for high-pressure consonants, leading to active compensatory articulations [7]. These compensatory articulations are characterized by retraction of anterior articulations often to sounds not occurring in the target language (at least in English) [8]. Moreover, a tendency for overuse of the tongue dorsum as a strategy to improve velopharyngeal closure [8] may lead to subtle phonetic distortions which might be difficult to transcribe.
In addition to speaker-related factors, such as the anatomical differences caused by CLP, transcription is also influenced by listener-related factors. Lack of familiarity with the target language or with the specific subtype of SSD might lead to less accurate transcriptions [9]. Likewise, categorical perception, which all listeners are subject to, can lead to transcription of phonemic category collapses which might not truly represent the articulatory reality. Gibbon and Crampin [10] describe a case of an adult with CLP who produced both velar and alveolar targets as [c]; however, instrumental analysis in fact revealed a subtle difference between /t/ and /k/ targets. Covert contrasts such as this are known to occur in the speech of both young typically developing children and older speakers with disordered speech [11] and are not readily identified with phonetic transcription.
Despite these difficulties, phonetic transcription by specialist listeners is still the gold standard approach in CLP [12] and the approach used widely by speech and language therapists (SLTs) across the world. In part this is because instrumental techniques have, in the past, been impractical for use with young children [13], but also because perceptual analysis is important in its own right. Howard and Heselwood [14] argue that perceptual and instrumental analysis are “two qualitatively different sides of the same phonetic coin” (p. 941). That is to say that although instrumental techniques provide objective information on the movement of the articulators, only perceptual techniques such as phonetic transcription can truly represent the mental processes in the mind of the listener. This is important because the ultimate goal of communication, and of course of speech therapy, is to be understood by the listener. Phonetic transcription also has the advantage that it is cheap, relatively quick, and requires only the skills of the SLT. The accuracy of these transcriptions can be improved in several ways: audio and/or video recording the speech samples for careful and repeated listening, use of listeners familiar with the client group, use of multiple transcribers [15], acoustic analysis, or instrumental articulatory techniques. A recent study by Klintö and Lohmander [16] showed that using video recordings of the face rather than audio recordings only improved intertranscriber reliability and resulted in identification of more errors in 3-year-olds with CLP. However, when age-appropriate phonological errors were removed from the analysis, the number of errors identified by audio only versus audio plus video were not significantly different, suggesting that using video improves transcription only marginally. Acoustic analysis can also be used to supplement phonetic transcription, and several studies have used various measures to identify covert contrasts [see for example 17 and 18]. A large body of literature, particularly in speakers with CLP, shows that instrumental articulatory techniques can be used to identify and quantify subtle phonetic errors in children’s speech which might not be identified using transcription alone [see 19 for a list of electropalatography (EPG) papers]. Although instrumental techniques do not have the advantages of audio-only phonetic transcription in terms of being easy to use and inexpensive, often the instrumental technique in question is an obvious choice for remediating the child’s SSD. For example, Cleland et al. [20] reported an interesting case of a 9-year-old child called “Rachel” who presented with a particularly persistent case of velar fronting. Analysis with ultrasound tongue imaging (UTI) showed that Rachel presented with undifferentiated lingual gestures [21] and retroflexed productions of most stops. A follow-up paper [22] showed that Rachel was able to use ultrasound (US) real-time visual biofeedback in therapy to achieve correct productions of velars. Thus, application of instrumental techniques to both assessment and intervention offers a dual benefit that transcription alone does not.
Instrumentation and Covert Errors in CLP
A number of speech errors which defy broad phonetic transcription have been reported in the literature, mainly in EPG studies. EPG uses a custom-made pseudo-palate embedded with sensors (normally 62) to measure the timing and location of tongue-hard palate contact. A number of errors revealed by EPG were reviewed by Hardcastle and Gibbon [23], including misdirected articulatory gestures and double articulations where they ought not to exist. Despite potentially being missed by traditional phonetic transcription, these errors are important diagnostically because they provide evidence for articulatory, rather than phonological, difficulties, perhaps suggesting that different therapy approaches might be required. Because of this and because of the potential for EPG to also be used as a biofeedback approach, the United Kingdom Royal College of Speech and Language Therapists recommends EPG as an objective assessment and therapy approach for CLP.
Gibbon [24] summarized the EPG literature (23 papers over 20 years) on abnormal tongue-palate contact patterns in speakers with CLP. She suggested categorizing errors into eight abnormal patterns: (1) increased contact, (2) retraction to palatal or velar articulation, (3) fronted placement, (4) complete closure (loss of grooving in sibilant productions), (5) open pattern (no tongue-hard palate contact), (6) double articulations, (7) increased (phonetic level) variability, and (8) abnormal timing (e.g., articulatory groping). There is no discussion in Gibbon’s paper as to which of these errors might be vulnerable to being misidentified though audio-only transcription (AO). Presumably errors such as retraction (i.e., classic “backing” in CLP speech where alveolars are produced at the velar place of articulation) are easy to transcribe when they result in native speech sounds and category collapses, whereas errors such as double articulations and increased subphonemic variability will be harder to identify using transcription alone, although this has not been empirically tested to date.
It is clear that using instrumentation such as EPG might add value to the transcription of disordered speech, although no previous studies have directly compared audio-based transcription with articulatory-based transcription in a large number of children. In part this is because EPG is logistically difficult and expensive since each child requires a custom-made artificial palate and a period of stable dentition. To our knowledge, EPG is never used for routine assessment in CLP due to costs, and therefore studies of large numbers of children are lacking. In contrast, UTI is becoming increasingly popular in the phonetics laboratory, but is relatively new to clinical phonetics.
Ultrasound Tongue Imaging
UTI uses standard medical US to image the tongue in real time, making it also suitable for visual biofeedback therapy. Over thirty small studies have shown it to be effective for treating persistent SSDs [e.g., 22 and 25], and other studies have used it for fine articulatory analysis of lingual movements when synchronized to the acoustic signal [e.g., 20 or 26]. The US probe is placed under the chin, capturing most of the surface of the tongue in either the midsagittal or coronal plane. In both views, the imageable area is constrained by shadows from bone, with the tongue tip in particular being susceptible to a shadow from the mandible. US has been used in a small number of studies to describe disordered speech. Both McAllister Byun et al. [27] and Cleland et al. [20] used it to measure covert contrast in children with velar fronting. Cleland et al. additionally used it to describe undifferentiated lingual gestures and retroflex articulations (see above). They compared phonetic transcription of velar stops in children with idiopathic SSD with US analysis and showed that for the majority of children, US analysis added very little information to the audio transcription. The exception to this was “Rachel,” where only careful US analysis revealed undifferentiated lingual gestures and retroflexion. Rachel was one of the children in this study with the most severe and complex SSD, suggesting that instrumental techniques may be most beneficial for this subgroup of children. However, US analysis has traditionally been a laboratory-based process requiring specialist software and time from a specialist speech scientist. An aim of the current study was therefore to determine whether observations from UTI could be realistically incorporated into the clinical environment.
In terms of using US to supplement phonetic transcription in CLP, Bressmann et al. [28] showed covert articulatory movements during repetitions of /VkV/ in speakers with cleft palate, but no comparison with traditional transcription was given. Unlike Gibbon [24], Bressmann et al. made no attempt to classify the errors observed in US though they did note pharyngeals (which would be described as “open pattern” by Gibbon), fronted placement, and double articulations. In this study the observations from UTI are descriptive in nature, performed offline by specialist researchers. While this is more time consuming that live phonetic transcription, it may be analogous to the blinded specialist listener paradigm suggested as the gold standard by Sell [12].
Purpose and Hypotheses
The purpose of the current study was to compare AO transcription of disordered speech to transcription accompanied by UTI. As discussed above, cleft palate speech makes an excellent test case for this experiment as it is known to be vulnerable to subtle phonetic errors and non-native-sounding productions; we therefore predicted that US-aided transcription (UA) would have an advantage in this client group as it allows visualization of non-native speech sounds such as pharyngeals (in English-speaking children) and can reveal covert articulations. We hypothesized that an advantage might be demonstrated by (1) an increase in the number of active compensatory errors identified when using UA compared to AO and (2) increased interrater reliability when using UA compared to AO. A secondary aim of this study was to develop a clinician-friendly UA recording format for describing speech errors using US. We aimed to determine whether it was possible to identify US-based errors in real time, live in the clinic (to emulate an SLT doing a live transcription) or whether offline careful viewing of the US was necessary (similar to offline analysis of video/audio recordings as is standard for CLP speech). We predicted that offline careful viewing of US (UA) would have an advantage over live in-clinic transcription (CT) and that an advantage might be demonstrated by (1) an increase in the number of compensatory errors identified when using UA compared to CT and (2) increased interrater reliability when using UA compared to CT.
A further exploratory aim of this study (to be quantified using US indices elsewhere) was to classify the errors according to Gibbon [24] and Cleland et al.’s [20] error types and to explore whether different error types were more prevalent in UA, i.e., whether covert errors such as double articulations would be noted with increased frequency compared to AO.
Subjects and Methods
Speakers
Children attending routine appointments over a 12-month period at the West of Scotland Cleft Lip and Palate Service were invited to participate. Inclusion criteria were syndromic or non-syndromic CLP, age 3–15 years, and spoken English. Both children with and without overt SSDs were included. Children with cleft lip only, severe learning disability, or no speech were excluded.
Thirty-nine children consented to taking part in the project. Of these, data from 35 children (15 female, 20 male) aged 3;07–12;02 years (mean age 6;09 years) were included for transcription. Three datasets were unusable due to very poor quality of US images; one dataset was collected after the files had been submitted for transcription (see Table 1 for biographical and medical information of the speakers).
Materials
Speech materials were adapted from the CLEFTNET proto-col originally developed for EPG recordings [29]. This comprised (1) counting from one to 10, (2) ten repetitions of all voiceless (or voiced where necessary) obstruents and sonorants in /aCa/, (3) sentences from GOS.SP.ASS.’98 [30], and (4) five minimal (pair) sets containing contrasting common substitutions for /s, ∫, t∫, t/ in a variety of vowel environments (for example, “seat, sheet, cheat team, keep”). Speech materials were presented orthographically on a laptop screen and pre-recorded audio prompts were provided for imitation. Younger children sometimes also required support from the researcher to imitate the prompts, for example, a further live prompt. All materials were collected by an SLT trained in US data acquisition (S.L.). Only the repetition data of single consonants were included for analysis in this paper.
US Recording Set-Up
High-speed US data were acquired using a Micro machine controlled via a laptop running the Articulate Assistant Advanced softwareTM version 2.17 [31] which recorded both audio and US. The echo return data were recorded at ∼100 fps over a field of view of either 144° or 162°. The field of view was selected with the probe positioned to allow the greatest view of the tongue, including both the hyoid and mandible shadows. The microconvex US probe was stabilized with a custom-made lightweight flexible plastic headset. For 6 children the probe was held in place by hand, either because the headset was too big or the children requested not to use the headset. This may have affected the quality of the images. Data were collected in a quiet room at the Glasgow Dental Hospital before or after a routine appointment.
Data were collected first in the midsagittal view for all materials. The probe was then turned 90° and the materials containing sibilants were collected again. In this coronal view it is possible to see lateral contact and/or bracing which is important for /s, ∫, t͡∫/. These data were subsequently excluded from the analysis due to poor image quality in 11 of the children. From a practical perspective it was very difficult to determine which coronal slice of the tongue (i.e., more anterior or posterior) was imaged in each child. Participants were asked to swallow a sip of water at the beginning and end of recording in each position to allow a palate trace to be drawn for future quantitative analyses.
An Audio-Technica 3350 microphone was attached to the headset (or child’s clothes where the headset was not used) to record audio information simultaneously with the US recording.
Transcriptions
Three types of transcriptions were performed: (1) descriptive observations live in the clinic by the SLT (S.L.) recording the data, online during the recording session (CT), (2) descriptive UA by two US-trained SLTs (J.C. and E.S., hereafter UA1 and UA2) after the sessions (UA), and (3) traditional blinded phonetic transcription by two CLP specialist SLTs (AO1 and AO2) using the audio recording only (AO).
Live Transcriptions/Observations (CT)
An assessment form to record phonetic transcription and initial subjective impressions from the US was developed to record observations from the live US by the clinician recording the data [see 32]. Error types were based on Gibbon [24] (note that we were unable to provide data on “complete closure” due to poor image quality in the coronal view and that the description of the error type is included in Table 2 for completeness) with the addition of retroflexion [20], which has not previously been identified using EPG but is highly salient on US. Since US shows tongue position and shape rather than tongue-palate contact as in EPG, we re-described the errors to better reflect what might be viewed using US. For example, “open pattern” occurs in EPG when there is no tongue-palate contact (a completely “white” EPG frame); this might be due to a post-velar production, e.g., a uvular production, or to undershoot of an articulatory gesture. Using US, it is possible to categorize both of these types of errors separately. The descriptions, along with example videos, are defined in an open-access manual [32] and summarized in Table 2. The video examples were used to train the SLT collecting the data (CT) and later the offline transcribers (UA1 and UA2) in US error types.
For the CT, S.L. recorded the data, marking observations from both the US and the live productions of the child’s speech on the assessment form at the same time. This process emulated a traditional live CT which, as identified previously, is known to be problematic for CLP speech. CT transcribed errors using the International Phonetic Alphabet (IPA) [33] and the extensions to the IPA (extIPA) [34] and then classified the errors as (1) correct, (2) classified into one or more of the above nine error types, or (3) classified as a non-imageable error, for example errors of voicing or passive cleft errors such as nasal escape. It was possible for an error to fall into multiple categories, for example alveolar targets might be retracted, variable, and with nasal escape.
US-Aided Transcription
UA1 and UA2 transcribed the children’s speech offline. Prior to watching/listening to the simultaneous audio and US recordings, both transcribers watched the video exemplars of the error types [30]. Both transcribers are SLTs with a speciality in SSD (though not specifically cleft palate speech); UA1 is an expert in US analysis of disordered speech and was responsible for training both UA2 and CT. Although it would have been desirable to train cleft palate specialists to be proficient users of US, it was not practical due to training time constraints.
Both UA transcribers worked independently, but in the same room, to transcribe the children’s speech. Multiple sessions of 1–2 h each were required to transcribe all of the data. The transcribers watched (and listened to) each US recording only once, in real time. The time to record observations after each viewing was not constrained. Errors were transcribed using IPA and extIPA symbols and then classified in the same way as the CT, i.e., as correct or into one or more of the nine error types identified in Table 3, or as a non-imageable error.
Traditional Phonetic Transcription (AO)
Two specialist SLTs in CLP transcribed the data using symbols of the IPA and extIPA. Both clinicians had over 10 years’ experience in the transcription of cleft palate speech and were not associated with the research project. The audio signal was extracted from the US plus audio recordings of the children’s speech, and the AO transcribers were provided with .wav files only; it was therefore not possible for them to view any US.
A transcription guide was given to the AO transcribers detailing the contents of the recordings and instructions to listen once to each file, then note the transcription, then move to the next recording. They were instructed to transcribe only the consonant for the VCV prompts and were permitted to note only one transcription if they thought all ten repetitions sounded the same. Following this, they were permitted to listen to each file again up to a maximum of three additional times and make any further transcriptions in an additional column. These were not included in the current study.
The AO transcriptions were then coded by the research team as either “correct,” i.e., the transcription matched the target, classified into one of the above nine error types (see Table 2 for expected phonetic transcriptions), or classified as a non-imageable error (i.e., a non-lingual error that cannot be viewed on US). This process was similar to a traditional phonetic/phonological pattern analysis.
Statistical Analyses
To determine whether use of US led to an increased error identification rate, we averaged the mean correct attempts within groups of transcribers (CT, UA, AO) and compared these using Friedman’s two-way analysis of variance by ranks.
To determine whether UA led to improved interrater reliability, we calculated Cohen’s kappa with a 95% confidence interval between raters for three different subclasses of errors: (1) correct versus incorrect productions only, (2) imageable (one of the nine US-visible error types) versus non-imageable errors (such as changes in manner of articulation or nasal resonance), and (3) interrater reliability across all nine error types. For the purposes of interrater reliability, only the first error identified (Table 2) was coded as Cohen’s kappa does not allow multiple categorization of a single error.
Descriptive Analysis of Error Types
To determine whether the CT, UA, and AO transcribers were more likely to identify specific types of errors (for example whether errors which are traditionally thought of as covert were more likely to be noted in the US), visual inspection of the data was carried out due to the small number of raters and speakers.
Results
Error Identification
There were a total of 770 items produced by the children and young people recorded. The CT transcriber annotated only 493 items, whereas the UA and AO raters were able to transcribe between 629 and 712 items. The CT and UA transcribers were unable to detect the presence or absence of complete closure (a “domed” rather than “butterfly-wing” shape in the coronal view) during sibilants due to poor image quality, and these items were therefore excluded for all transcribers (see above). The CT transcriber identified 380/493 items as correct, the UA transcribers identified on average 415/630 items as correct, and the AO transcribers identified an average of 432/707 items as correct.
A Friedman test identified no significant difference between all groups of transcribers in terms of the number of items identified as being “correct” (χ2(2) = 2.217, p = 0.330). However, a Friedman test to compare the number of errors identified was significant (χ2(2) = 21.317, p < 0.001). Dunn-Bonferroni post hoc tests showed that there was a significant difference between AO and CT (p < 0.0001) and also between UA and CT (p < 0.0001). In summary, our hypothesis that UA would lead to identification of more active compensatory articulations than AO was not supported; however, there was evidence that UA and AO led to identification of more errors than live transcription (CT). This appears to be due to CT having a tendency to mark items as correct or leave them out, probably due to time pressure and ability to manage both the child’s attention and the US recording. We would therefore not recommend clinicians to attempt live transcription/assessment using US.
Interrater Reliability
Table 3 summarizes interrater agreement, calculated using Cohen’s kappa with a 95% confidence interval. When classifying transcribed productions only as either correct or incorrect, agreement within transcription conditions was found to be “substantial” (κ = 0.65) for UA and “moderate” (κ = 0.47) for AO transcribers. When comparing across transcription conditions, agreement was lower, either “fair” or “moderate” (κ = 0.36–0.45).
Agreement for the CT carried out at the time of recording was found to be “fair” (κ = 0.35 and 0.37) when compared with UA and “moderate” (κ = 0.46 and 0.41) when compared with AO.
The same relationship was found between the transcription conditions when errors were subclassified into those that can (a US error) and cannot be imaged (a non-US error), and also when further subclassified into the remaining eight error types, but less agreement overall was noted (Table 3). The interrater agreement for UA remained “substantial” (κ = 0.65) whether measuring incorrect/correct only, imageable/non-imageable errors, or classifying into the eight error types, whereas AO was “moderate” for all comparisons. In summary, UTI appears to lead to substantial interrater reliability for detecting and classifying lingual errors in children with CLP and in general has “fair” agreement with traditional AO.
Descriptive Analysis of Error Types
Figure 1 shows error type classification by transcriber group. Classifications from UA and AO were averaged across both transcribers in each pair as reliability was substantial or moderate. No transcribers noted abnormal timing, which is probably due to difficulties noting this in real time with further quantitative articulatory analysis required. The UA group noted substantially more instances of increased contact, double articulation, and retroflex productions, which were either not noted or, in the case of double articulation, noted in only one instance by the other transcribers. This may suggest a benefit of UTI in detecting these covert error types as predicted, but further work with more transcribers and more speakers is required. The AO group recorded higher numbers of retracted placement than both the UA and CT groups. Increased variability was noted by both the UA and AO groups, but with higher rates in the AO transcribers. CT had a general pattern of low error detection rates (see above), although she frequently noted non-imageable errors (such as nasalization or voicing errors).
Discussion
Through this study, we sought to determine whether adding real-time articulatory information in the form of UTI has an impact on phonetic transcription. Shriberg and Lof [35] argue for the inclusion of instrumental approaches in analysing disordered speech and as a method for circumventing some of the problems with phonetic transcription. Here we take a different approach, using US as an additional modality during transcription rather than presenting laboratory-based quantitative analysis in competition with perceptual approaches. We chose to investigate CLP speech as it is particularly vulnerable to problems with interrater reliability due to covert errors and non-native phonetic realizations, and because it has been previously studied with a similar instrumental technique, namely EPG. In line with Klintö and Lohmander [16], adding an extra modality, in our case US, had little impact on the percentage of consonants recorded as correct. Both the AO and UA transcribers noted similar rates of both correct and in-error productions, suggesting that both methods are valid for making decisions about the overall severity of SSD or potentially for measuring outcomes after intervention. While this was previously well established for traditional transcription [13], to our knowledge this is the first group study to explicitly test the effect of adding an articulatory technique which provides information about internal articulations.
In contrast, an increased number of errors were identified when we compared live CT with transcription performed offline after the recording session. From this we conclude that using US to assess children “on the fly” may be unwise. Difficulties with noting errors may have been due to the transcriber’s shifting attention between the child, the recording equipment, and the transcription, but further studies comparing different people transcribing live in the clinic are required.
In terms of interrater reliability, we found that within category of transcribers, the UA pair had an advantage over the AO transcribers. Reliability was “moderate” for AO and “substantial” for UA. Thus, our hypothesis that using UA may lead to increased interrater reliability compared to AO transcription was confirmed. While this effect remains to be tested across larger numbers of transcribers and larger numbers of speakers, it is promising preliminary evidence that instrumental analysis adds an objectivity to transcription [14] that can complement perceptual transcription. Across-category reliability between AO and UA was fair or moderate, even when looking at reliability across the different subcategories of error. This reduction in reliability speaks to differences in the types of error identified by transcribers using US and transcribers using AO.
The UA transcribers classified errors into all of the types described by Gibbon [24], with the exception of complete closure and abnormal timing. Identifying complete closure was problematic due to difficulties with the coronal US view (though it should be noted that both transcribers commented that they could hear lateralization at times). Further methodological work should seek to improve the positioning of the probe for coronal recordings. Although we did not identify abnormal timing, it should be noted that EPG studies which have reported this phenomenon have relied on careful quantitative measurements of dynamic EPG patterns rather than real-time viewings as performed here. Ongoing work seeks to use a range of quantitative US measures to determine whether abnormal timing can be identified in this group of children. Nevertheless, the UA transcribers did note increased rates of double articulations and “open-pattern” (undershoot or uvular/pharyngeal articulations) in the children’s speech. Moreover, the transcribers noted a small number of instances of retroflex productions. This error has not previously been reported using instrumental analysis of CLP speech, although Cleland et al. [20] did report retroflex articulations in a child with a severe idiopathic SSD. Thus, it appears that UA can lead to the identification of unusual error types, which may be more common in children with severe SSD or with structural differences such as CLP influencing their articulation.
Limitations
In line with Howard and Heselwood’s [14] view that instrumental techniques should complement perceptual analysis, it is worth noting some of the limitations of the current study regarding both the study design and the use of US more generally. Firstly, our classification of errors was based on Gibbon’s classification system and was limited to those errors which involve tongue shape; however, CLP speech is also highly vulnerable to difficulties with the velopharyngeal mechanism. Thus, these may not have been identified in this study.
We made no attempt to determine whether the AO transcribers, who were experts in CLP, noted more instances of passive errors such as nasal escape, although all groups of transcribers did identify these types of errors. In this sense, US can only complement perceptual analysis and not replace it. This remains the case even if quantitative US measures are used to provide a more objective error analysis than the one provided here. Nevertheless, US did enable identification of different error types which are potentially diagnostically important. Moreover US, when used in the form of biofeedback, offers a potential approach for remediating these very errors [21].
The current study was constrained by the small number of transcribers. While it was necessary to have only one transcriber during the recording process (CT), further transcribers performing the UA and AO transcriptions would have been beneficial. Additionally, our UA and AO transcribers differed not only in the modality they were using, but also in their own previous experience: both AO transcribers were specialists in CLP, whereas both UA transcribers were specialists in other types of SSD. A potential solution would be to repeat the experiment with CLP specialists trained in the use of US, but time constraints prevented this in the current study. It would be equally useful to repeat the experiment with audio and US from speakers with different types of SSD.
Lastly, the study was constrained by the use of single viewings of real-time US only. While this gives the study potential ecological validity, it is probable that performing careful articulatory analysis, including quantitative US measures, would provide different results; this work is currently ongoing. Moreover, the speech materials used here were deliberately constrained to multiple repetitions of single consonants in /aCa/ contexts. It is our view that making judgements about tongue shapes in real time beyond single segment level is likely to be extremely difficult, although this remains to be empirically tested. We recommend playback of more complex speech materials in slow motion or frame by frame as well as quantitative analyses.
Conclusions
The addition of US tongue images to audio-perceptual information appears to improve the reliability of lingual error identification in cleft palate speech. Contrary to our hypothesis that the addition of US would lead to an increased number of errors being identified, we found no difference in the overall number of errors between AO and UA. However, transcribers who were given additional US information were able to identify increased instances of double articulations, pharyngeal/uvular articulations, and retroflexion within those productions identified as incorrect. However, it was not possible to detect these errors reliably in real time in the clinic. Through this study, we provide preliminary evidence that US may be a useful addition to the CLP assessment toolkit when images are recorded for later playback. This might be especially useful when US assessment is used as a precursor to US biofeedback therapy when the same technique can be used to remediate the very errors it helps identify.
Acknowledgements
We wish to thank all the children and their carers who gave up their valuable time to take part in this research project. Thank you to the Glasgow Dental Hospital and School for providing clinic space to make the recordings. We would also like to thank Stephanie van Eeden and Caroline Hattee for their transcription of the data, and David Young for his support with the statistical analysis.
Statement of Ethics
Participants and their parents/carers gave their written informed consent. The study protocol was approved by the National Health Service West of Scotland Research Ethics Committee and the University of Strathclyde Research Ethics Committee.
Disclosure Statement
The authors have no conflicts of interest to declare.
Funding Sources
This work was funded by grants from Action Medical Research (GN2544) and the Engineering and Physical Sciences Research Council (EP/P02338X/1).
Author Contributions
J. Cleland and L. Crampin designed this study and received the funding for it. S. Lloyd and L. Campbell collected the data. S. Lloyd, J. Cleland, and E. Sugden analysed the data. All authors were involved in the early preparation of this work. J. Cleland and S. Lloyd wrote the final paper. All authors read and commented on a final draft of the paper.