Background/Aims: Percent consonant correct (PCC) was originally described by Shriberg and Kwiatkowski [J Speech Hear Disord. 1982 Aug;47(3):256–70] as a severity metric for phonological speech disorders, and has been adapted and used in many studies on speech sound disorders. It is well-recognized that cleft speech is complex, consisting of several interacting parameters assessed simultaneously, with error sounds not in the listener’s own language. In speech outcome studies, narrow phonetic transcription and the reporting of intra- and inter-rater reliability are acknowledged as the gold standard. However, cleft speech brings special challenges to this task, as complex speech disorders are known to be associated with low transcriber agreement. Recent studies informed the decision to use PCC as the primary outcome measure in a cleft speech intervention study, given its common usage and familiarity. The aim was to specifically evaluate the intra- and inter-rater reliability of PCC in an intervention study, in contrast to other types of speech outcome studies. Methods: Two trained and experienced listeners analyzed 119 recordings, randomly selected from five data points before, during, and following intervention. The PCC score was separately calculated for words and sentences/phrases. Results: Using intraclass correlations (ICCs), Phase 1 results showed poor reliability for targets elicited for words (ICC = 0.07) and sentences/phrases (ICC = 0.42). Differences in classification of errors as glottal stops and consonant deletion accounted for this. Following further training, a second reliability study was undertaken showing improvement in the number of targets elicited in words (ICC = 0.85) and sentences/phrases (ICC = 0.94). There was very good inter-rater reliability for the PCC score on the word dataset (ICC = 0.9) and the sentence dataset (ICC = 0.88). Very good intra-rater reliability (ICC = 1.0) was found for the PCC score in both words and sentences/phrases for each listener. One listener consistently gave higher modified PCC scores. Conclusions: In cleft speech intervention studies, reliability of the number of targets elicited should be reported. Listeners need to distinguish between glottal articulation and consonant deletion, in order that the PCC score is meaningful. Attention should be paid to where listeners are reliable, but their pattern of scores consistently differs but in a consistent way. More research is needed on measuring the resolution of articulation difficulties in cleft intervention studies.

Children born with cleft palate ± lip are known to be at high risk of speech difficulties, following palate repair in infancy, with potential educational and psychosocial consequences [1-4]. The need to determine the optimum timing, staging, and technique of palate repair in terms of speech outcomes has been at the center of cleft research for decades [5]. As a consequence, much of the focus in the cleft literature on speech outcomes has been on reports of surgical outcomes, with much less attention to therapy intervention studies in cleft speech. A systematic review by Bessell et al. [6] recommended more randomized controlled trials and observational studies, together with the need to identify elements of speech therapy interventions that may be effective, including duration, age-range, setting, intensity, delivery, and theoretical perspective of the intervention method. Meinusch and Neumann [7] subsequently commented on the review and stated how it could be used as a starting point for developing well-designed and well-reported intervention -studies.

It is well-recognized that the nature of cleft speech is complex and different from other speech disorders in a number of ways. Unlike developmental disorders, cleft speech disorders can be due to structural etiologies described as passive characteristics [8, 9] or due to active mislearning, sometimes associated with a history of VPI, which frequently co-occur. Cleft speech consists not only of consonant sound errors, known as cleft speech characteristics (CSCs) [8, 9], but also resonance (hypernasality, hyponasality) and nasal airflow errors (audible nasal emission, nasal turbulence), all key aspects of an assessment. CSCs describe the typical types of errors heard in cleft speech, and include sounds not in the English language. They largely reflect errors of place of articulation, frequently articulated further back in the oral cavity, for example, alveolar targets produced as palatals or velars or even further back in the larynx/pharynx/velopharynx as glottal stops, pharyngeal fricatives or active nasal fricatives, respectively. One group of CSCs describes typical passive errors, usually of manner. The different cleft speech parameters have to be assessed simultaneously, sometimes interacting with each other, further complicating the perceptual assessment. Narrow phonetic transcription is acknowledged as the gold standard for this purpose[10-12]. It is also widely accepted that listeners require additional postgraduate specialist training to analyze these often complex speech presentations. In speech outcome studies, intra- and inter-rater reliability must be reported [9-11, 13-19]. This is now accepted as a requirement of speech outcome studies in cleft palate, irrespective of the nature of the study. However, cleft speech brings special challenges to this task. Complex speech disorders are known to be associated with low transcriber agreement [20, 21]. Gooch et al. [13] drew attention to the fact that there are two tasks involved: identifying the error and recalling its transcription. Cleft studies have described training in phonetic transcription and concluded that with agreed conventions and rules and regular transcription tests and training [22, 23], it is possible to achieve good consistency in phonetic transcription and that ratings can be reliably performed [9, 11, 14, 22, 23]. Chapman et al. [14] listed the reasons which contribute to low intra- and inter-rater reliability in cleft speech studies as the framework of analysis, the nature of the speech sample, procedures used for data collection, recording quality, playback conditions, and listener characteristics such as level of experience and the training undertaken. Klintö et al. [24] also added the severity of the speech disorder, quantity of material re-transcribed, time between original transcription and re-transcription, and methods used for calculation and criteria of agreement as other contributing factors.

In this regard, different statistical approaches for calculating reliability have included point-to-point percentage agreement, correlations, kappas/weighted kappas, and single or average measure intraclass correlation (ICC) coefficients. There are advantages and disadvantages to these approaches. The advantage of point-to-point percentage of agreement is that this is an estimate of the reliability of the data at the level of each sound, word, or other unit of analysis [25]. When calculating percent agreement between listeners for percent consonant correct (PCC), it is usually easier to achieve high agreement since there are only two possible options for every consonant – correct or incorrect. However, the risk of chance agreement is high [26]. Reliability coefficients take chance agreements into account, while providing information about the ability of scores to distinguish between subjects, and reporting group outcomes. However, where there is a lack of variability in the range of severity in a dataset such that the whole scale is not used, these statistics can be misleading [11, 14, 19].

Notwithstanding, the PCC is widely used as the clinical and research metric in non-cleft speech outcome studies. It was originally defined as a measure of the proportion of correctly articulated consonants in the transcription of conversational speech representing the severity of involvement related to phonological disorders [25]. It is a measure of both phonetic/articulatory and phonological errors, in which all errors are given the same weight. Shriberg et al. [27] introduced the percentage consonant correct – adapted (PCC-A) and revised (PCC-R) that allow common, and common together with uncommon distortions (PCC-A and PCC-R respectively), the latter including nasal emission on consonants. Since then, further modified measures of PCC have been reported in several cleft speech studies [28-33]. For example, authors have reported PCC together with additional analyses including percent correct manner, and percent correct place [30] and percentage of compensatory articulation related to place and manner [29]. Klintö et al. [33] defined PCC-A in which age-appropriate simplifications of /s/ and passive CSCs (audible nasal air emission, nasal realizations, and weak articulation) are scored as correct. Subsequently, percent oral consonants correct (POCC), percent oral errors, and percent non-oral errors have been used [23, 34, 35]. The POCC is a measure in which audible nasal air leakage/emission, weak articulation, and nasalization of consonants are scored as incorrect together with phonological and articulatory errors. This would appear to be the original description of PCC.

Given the widespread and continued use of PCC in cleft palate speech studies, especially in the evaluation of speech outcomes related to surgery, and considering its limitations, the purpose of the present study was to evaluate the reliability of PCC as an outcome measure for cleft speech in an intervention study.

This study was undertaken as part of a randomized controlled trial which aimed to compare if Parent Led, therapist-supervised Articulation Therapy (PLAT) intervention was comparable to traditional speech therapy intervention in children with cleft palate-related speech disorders [36, 37].

Participants

Speech and language therapists (SLTs), who had been trained on the Cleft Audit Protocol for Speech-Augmented (CAPS-A), were invited to serve as raters/listeners through an email invitation to the Cleft centers in the UK and Ireland. The CAPS-A is a validated outcome tool, widely used in the UK and Ireland for audit and research outcome studies [9], with an associated two-day training program [38]; https://www.caps-a.com. SLTs sent an expression of interest and two independent SLTs highly experienced in the analysis and rating of cleft palate speech in research projects were invited to participate. Both had participated in the Scandcleft [22, 23] and Timing of Primary Palatal Surgery trials (http://www.tops-trial.org.uk/ accessed November 12, 2018). These are two large, international randomized controlled trials in centers in the UK, Scandinavia, and Brazil investigating the timing, staging, and techniques of palate surgery, in which speech is one of the primary outcome measures. Reliability data of two listeners were accessed from the Timing of Primary Palatal Surgery trial with the listeners’ permission. In addition, each listener had regularly participated in consensus listening activities for mandatory national audit in their UK workplace [11, 39, 40].

The two listeners attended a one-day training course. The age range of the study cohort, the types of speech characteristics, and the speech sample were all discussed. Guidelines were given on the identification of correct versus incorrect targets, the analysis of single words and sentences/phrases, the identification of phonological processes, the analysis of resonance, audible nasal emission and nasal turbulence, and the completion of the independent listener analysis form (ILAF; online suppl. material; for all online suppl. material, see www.karger.com/doi/10.1159/000501095). Following this, four cases, not included in the study, were each analyzed and rated by the listeners and authors (D.S., T.S.), and consensus judgments were agreed.

Data Collection

Forty-four children (35 boys, 9 girls, with a mean age of 4.78 years; SD of 0.89 and age range of 2.9–7.6 years) had been included in the intervention study. High quality audio video recordings had been undertaken on 5 occasions over a 7-month period; at baseline (time point 1), immediately pre-intervention (time point 2), midway through intervention (time point 3), immediately post-intervention (time point 4), and 2 months post-intervention (time point 5). The duration of the intervention was 12 weeks, that is, between time points 2 and 4. Recordings were made either using the Zoom Q3 or Q4 HD cameras mounted on a desktop stand [38].

The recorded speech sample consisted of pictures of 30 single words from the articulation section of the Diagnostic Evaluation of Articulation and Phonology test [41], and pictures of 13 sentences/phrases from the CAPS-A recommended speech sample [8, 38]. In addition, a sound stimulability task and a spontaneous speech sample based on the Renfrew Action Picture Test [42] were collected but not recorded.

Data Preparation

Video data were collected at time points 1, 2, 4, and 5 for all 44 cases. At time point 3 video data were missing on six cases due to examiner error. Hence, 214 video recordings were anonymized and randomized using the Excel Rand function. Live assessments were initially carried out by the research SLTs and co-authors during all assessment time points. PCC was calculated for these assessments to ascertain the severity of speech disorder and to plan ongoing therapy. In preparation for randomization, all cases were ordered according to their live PCC ratings. Systematic sampling was undertaken, ensuring the range of severity of speech was represented. Seventy videos were randomly selected; however, during analyses it was noted that one recording had been lost and hence 69 videos were used. All video recordings were saved to an encrypted external hard drive. In total, 119 video edits were included in the two-phase reliability study.

Perceptual Assessments

The two listeners were each given a pair of Sennheiser 90 high-quality headphones and an encrypted hard drive containing 69 anonymized video recordings with time point and trial arm unknown to the listeners. They included all the CSCs [8, 9], different time points with the range of severity of PCC in words and sentences/phrases represented.

The word sample consisted of 40 consonant targets, and included a single target for each English sound in word initial and word final position. The sentence/phrase sample consisted of 13 high-pressure consonant targets (plosives, fricatives, and affricates) in word initial and word final position. The listeners transcribed the targets using narrow phonetic transcription.

Data Analysis

A modified PCC was adopted as the primary outcome measure [27]. Consonants produced with correct place, manner, and voice but with accompanying nasal emission/turbulence, weak/nasalized realizations, and dental/interdental quality were categorized as correct. PCC was calculated by the number of targets elicited divided by the number of targets correct, multiplied by 100. Developmental errors were included in the calculation. PCC scores were separately calculated for the word and sentence/phrase samples, and intraclass correlations were used to evaluate the inter- and intra-rater reliability of the two listeners for each dataset in both Phases 1 and 2. The strength of agreement for the ICCs was based on Altman [43]: 0.00–0.20 poor, 0.21–0.40 fair, 0.41–0.60 moderate, 0.61–0.80 good, 0.81–1.00 very good.

Phase 1 Method

Analysis was carried out over a 3-week period. For each video edit, each listener analyzed the speech sample, completed the ILAF and inputted the data into an Excel spreadsheet. They entered the recording edit ID, the number of targets elicited, the number of targets correct, and coded the speech errors as CSCs as per the CAPS-A protocol. In classifying speech errors, the listener recorded the number of consonants affected and listed the consonants affected for each of the following CSCs: lateral, palatal, double articulation, backed to velar/uvular, pharyngeal, glottal, active nasal fricative, nasal realizations, and gliding. There was also an option to mark “other” for unusual cleft speech errors not listed above. Phonological processes were rated as present or absent, independent of the age of the child. Listeners also rated hypernasality, audible nasal emission, and nasal turbulence using the CAPS-A scales shown in the online supplementary material. The PCC scores for the word and sentence/phrase samples were automatically calculated using Excel for each speech sample.

Phase 1 Results

Table 1 showed unacceptably poor inter-rater reliability on the number of targets elicited at both word (ICC = 0.07; 95% CI –0.17 to 0.3) and sentence level (ICC = 0.42; 95% CI 0.2–0.59) for each dataset in Phase 1. In contrast, there was very good inter-rater reliability for the number of targets correct at both word (ICC = 0.84; 95% CI 0.75–0.89) and sentence/phrase level (ICC = 0.82; 95% CI 0.72–0.88) and PCC (ICC word = 0.80; 95% CI 0.70–0.87 and sentence/phrase = 0.79; 95% CI 0.69–0.87).

Table 1.

Results of phases 1 and 2: ICCs for the number of targets elicited, the number of targets correct and PCC for the word and phrases/sentences datasets

Results of phases 1 and 2: ICCs for the number of targets elicited, the number of targets correct and PCC for the word and phrases/sentences datasets
Results of phases 1 and 2: ICCs for the number of targets elicited, the number of targets correct and PCC for the word and phrases/sentences datasets

Table 2 shows the variability in the number of targets elicited and the effect on the derived PCC for the word data analysis of each listener for four children. In video edits 1 and 2, although the number of targets correct is very similar for both listeners, the number of targets elicited differs by at least 10 points with resultant large differences in the derived PCC score. This contrasts with video edit 4 in which the number of targets elicited is the same for both listeners (n = 40); the number of targets correct differs between the listeners and is reflected in the differing modified PCC scores. In video edit 8, listeners differ not only in the number of targets elicited but also in the number of targets judged correct by over 10 points with a large difference of 17% between the 2 PCC scores.

Table 2.

Word data in phase 1 illustrating problems associated in listener differences in number of targets elicited

Word data in phase 1 illustrating problems associated in listener differences in number of targets elicited
Word data in phase 1 illustrating problems associated in listener differences in number of targets elicited

Phase 2 Method

The raw data on the ILAF (online suppl. material), phonetic transcriptions, and Excel spread sheets were examined by the authors to identify the cause of the discrepancies in the number of targets elicited between the two listeners. One striking issue was in the transcription of glottal stops and consonant deletion. A pattern emerged of one listener transcribing a target as a glottal stop and the other transcribing it as a consonant deletion. The glottal stop was counted as a “target elicited” in contrast to a consonant deletion, accounting for the differences in number of targets elicited between the listeners.

Although not affecting the number of targets elicited, other sources of disagreement between listeners were noted which could have significant implications for the results. These involved transcription differences, where for example, active nasal fricatives were sometimes transcribed as nasal realizations or as correct targets with accompanying nasal emission/turbulence. Interestingly, Willadsen et al. [23] reported a similar issue and in their study they accepted these as agreements. In a cleft intervention study, where targets for therapy are often active nasal fricatives, it was critical that the listeners were accurate in the perception and transcription of these errors.

Given these findings, a further training course was undertaken. A half-day session was completed over Skype, reviewing the disagreements outlined above. It was agreed that if the child attempted to say the word, the target should be included as elicited even if the transcription was a consonant deletion. Further listening practice focused on contrasting glottal stops and consonant deletion errors. Other issues regarding the perception and transcription of active nasal fricatives (i.e., active mislearning) versus consonants with accompanying nasal emission/turbulence, and weak versus unreleased targets, nasal realizations and nasalization of plosives (i.e., passive structurally related errors) were also discussed and agreed. Following the second training session, a different 70 video edits, 20 of which were duplicated for the intra-rater study, were analyzed. The 20 duplicated edits were randomly selected and inserted in the series of video edits, ensuring there were at least a minimum of five videos before the duplicate was repeated. As discussed previously, the PCC scores derived from the face to face assessment was used to ensure that the full range of CSCs and speech severity was included in Phase 2. Phase 2 analysis took place over the course of 4 weeks.

Phase 2 Results

Inter-rater reliability for the number of targets elicited at both word (ICC = 0.85; 95% CI 0.74–0.91) and sentence/phrase level (ICC = 0.94; 95% CI 0.9–0.97) and the number of targets correct at both word (ICC = 0.91; 95% CI 0.85–0.95) and sentence/phrase level (ICC = 0.89; 95% CI 0.81–0.94) were very good. Inter-rater reliability for the modified PCC scores was very good on the word dataset (ICC = 0.9; 95% CI 0.84–0.94) and the sentence/phrase dataset (ICC = 0.88; 95% CI 0.8–0.93). ICCs indicated very good intra-rater reliability (ICC = 1.0; 95% CI 0.98–1.00) for modified PCC scores for targets in both words and sentences/phrases for each listener.

We were also interested in any trends in similarities and differences in the analysis/rating task between the 2 listeners. Figures 1 and 2 showed that listener A consistently scored the modified PCC in words and sentences/phrases higher than Listener B.

Fig. 1.

Inter-rater reliability for words.

Fig. 1.

Inter-rater reliability for words.

Close modal
Fig. 2.

Inter-rater reliability for sentences.

Fig. 2.

Inter-rater reliability for sentences.

Close modal

The aim of this study was to evaluate the intra- and inter-rater reliability of the modified PCC scores in an intervention study of children with a cleft palate-related speech disorder. The reliability study was conducted using a sample of 119 video recordings, collected longitudinally at baseline, immediately pre-intervention, midway through the intervention, immediately post-intervention, and two months post-intervention.

The study formed part of an intervention study, in which there were 214 video recordings requiring analysis. It is known that phonetic analysis and ratings is a time-consuming process [44, 45]. The task involved the transcription of 40 targets in a corpus of 26 words, and 26 targets based on 13 sentences/phrases. The listener was then asked to sum the total targets elicited and the total targets correct for each dataset, and the PCC was automatically calculated. This was followed by a task of categorization of errors into their CSC category, with a calculation of the number of targets affected. Finally, the listener noted the presence of any phonological processes, and rated resonance and nasal airflow. On average, it takes between 20 and 40 min depending on the complexity of speech. The original plan had been for each listener to analyze 107 videos, to ensure an equal distribution of the workload and reduce the possible introduction of errors from listener fatigue. Inter-rater reliability was to have been based on 70 video edits. The same 70 edits (i.e., 30% of the whole recording archive) would be re-randomized and analyzed a second time to evaluate intra-rater reliability.

However, initial results indicated unacceptably poor agreement on the number of targets elicited both in words and sentences/phrases and so further training and reliability testing was necessary. It was difficult to ascertain from the data alone if this was a problem of perception or classification. In Phase 2, 70 video edits, including 20 duplicated edits, were analyzed by the two listeners. Results of this phase showed very good inter-rater agreement on the number of targets elicited, number of targets correct at word and sentence/phrase level, and modified PCC scores. Intra-rater reliability was very good for both listeners.

Although reliability was very good, one listener consistently gave higher PCC scores than the other. Whilst this may not be as important in some types of cleft studies and may not be an issue for speech disorders with other etiologies, this difference is critical in cleft intervention outcome studies, as results could be affected by the listener and not speech performance. For example, if the more generous listener analyzed a pre-intervention edit and the more critical listener analyzed the post-intervention edit, any positive change in PCC scores may not be reflected in the data, when a real change had taken place. Conversely, if the more critical listener analyzed a pre-intervention edit and the more generous listener analyzed the post-intervention edit, the positive change in PCC scores is at risk of being over-estimated. This had major implications for the original plan of dividing the video archive equally between the two listeners for analysis. Thus, one listener only (listener A) analyzed the remaining 164 video edits.

This study showed that despite poor inter-rater reliability for targets elicited (ICC = 0.07 in words and 0.42 in sentences/phrases), intriguingly there was very good and good inter-rater reliability for the modified PCC score (0.80 in words and 0.79 in sentences/phrases). The extent to which others have also had this problem is unknown as usually reliability data on the number of targets elicited is not reported. Furthermore, to what extent is a lack of agreement on the number of targets elicited a concern? Our interpretation is that if the number of targets elicited varies considerably between transcribers, the PCC score will be calculated on a different number of consonants which potentially may give very misleading results. This was well illustrated by comparing the listeners’ findings on the word sample of four children, as described above. It does not make intuitive sense for the number of targets correct to be judged as fairly similar for both listeners, and yet the number of targets elicited to differ considerably with large differences in the derived PCC score. In contrast, where the number of targets elicited is the same or very similar for both listeners but the number of targets correct differs, this does make sense, and suggests a difference in the perception and/or transcriptions of the targets between the listeners. Where there are large differences between listeners in both the number of targets elicited and the number of targets correct, with the impact of very different derived PCC scores, it becomes harder to have confidence in the scores, and underlines the importance of ensuring the denominator of the PCC calculation is reliable.

This study has also illustrated the challenges of distinguishing between speech characteristics which are related to the cleft condition and developmental speech immaturities, often required in cleft surgical outcome studies in order that the impact of the structural anomaly is reported accurately, and indeed speech is managed appropriately. In this study, there were differences in the perception and transcription of glottal stops, a CSC in the non-oral category of errors [8, 9, 23, 46], and consonant deletion, which when occurring in word final position is interpreted as a developmental immaturity [47, 48] and when in word initial position is interpreted as a phonological disorder. In the Scandcleft study, Willladsen et al. [23] reported poor point-by-point agreement in a word naming test for phonetic transcription of several active and passive CSCs. To increase the rater agreement, they identified “minor disagreements” and accepted them as agreements, and one of these was glottal stops and consonant deletion. Whereas this may not have a major impact in speech outcome studies informing the timing and nature of primary surgery where there is a large sample size, it has very significant implications in a cleft speech intervention study. For example, glottal stops are a CSC and are considered serious errors of articulation which can be much more challenging to eradicate compared to consonant deletion. Furthermore, the therapy approach for glottal realizations differs considerably to that of a developmental speech disorder. Therefore, data categorization and reduction may need to be adjusted according to the nature of the study.

Although the modified PCC score was reliable in our study, it appears to have significant limitations in an intervention study. Indeed, it was originally developed as a measure of severity and not of change, so arguably this limitation should not be surprising. It does not reflect improvements where the target sound has changed in a positive direction but is still not classifiable as “correct” [49]. In our study, this was illustrated by one child who had an active nasal fricative for /ʃ, tʃ, ʤ/ pre-intervention, and used an oral /s, tj, dj/ post-intervention. Although this child had unquestionably made good progress towards resolving her speech difficulty, her consonants were not correct and the change was therefore not captured in the modified PCC outcome. Analysis using a Probe score [49, 50] may help to capture the way in which speech errors change with therapy. The Probe score works by comparing the child’s production of a sound to the adult target and applying a scoring system to each feature (place, manner and voice) that is incorrect. This could be adapted for cleft speech to include nasal release. So, for example, if a child uses an alveolar active nasal fricative for /s/, this would be given a score of -2, reflecting the error in manner and oral release with place and voice correct. If following therapy, the active nasal fricative is realized as (t, d), the score would improve to -1, reflecting an error in manner only. Thus, the probe score would capture how the child changes his speech in therapy, providing much needed insight about this process.

Despite previous literature on adaptations of PCC, the PCC score for the present study was only adjusted to include audible nasal emission/turbulence, weak or nasalized consonants and dentalization as correct. This is in keeping with other cleft intervention studies, where percent passive CSCs have been categorized as correct [51, 52]. Although Scherer et al. [31] used the PCC-R due to the young age of the children, no adjustments were made for age-appropriate articulatory and phonological simplification processes in our study. It was considered that including other adaptations of PCC (e.g., PCC-A, PCC for place, manner, oral, non-oral) would have been too time consuming for the listeners, due to the high number of speech samples that required phonetic transcription and rating in this randomized controlled trial. Furthermore, for the purposes of the research question this was not required. However, it would be interesting to explore the extent to which POCC, percent oral errors, and percent non-oral errors may reflect change in therapy and perhaps compare this approach with Probe data.

Limitations of Our Study

The speech sample was limited to words and sentence or phrase repetition, and not conversational speech as recommended by Shriberg and Kwiatkowski [25]. However, previous studies have adapted the speech sample to similar speech samples of word lists and sentence tasks and used the PCC reliably [30, 31]. Indeed, Klintö et al. [24] reported that the transcription of single words was the most reliable, with sentence repetition valid and reliable with similar articulation accuracy as in retelling and conversational speech. Therefore, this would appear to be some justification for the use of these speech samples. In the current study, the authors had to use word and sentence speech samples to enable repeated analyses of articulation on the same material at different time points. The transcription of the targets in word initial and word final position in the words and sentences/phrases, together with the stimulability task, formed the basis for planning the speech intervention program. The sample had to be time limited and efficient so that the task could be completed in one session, limiting the number of times targets could be sampled and the opportunity to detect inconsistency [53]. However, transcription of targets in both single words and sentences/phrases allowed for occurrence of the targets of two exemplars as recommended by Stoel-Gammon [54] albeit in different word positions in the two different speech samples.

In the intra-rater reliability study, the listeners rated 20 video edits a second time within a 4-week period. Although the order of the duplicated edits was re-randomized and controlled for its place in the series of video edits, it is possible that the listener recognized the child and recalled details of their speech.

This study has demonstrated the importance of ensuring that there is reliability of the number of targets elicited in a PCC calculation in cleft speech intervention studies. This parameter is usually not reported. Care should be taken in intervention studies where listeners are reliable, but consistently perceive or categorize in a different way, ensuring the distinction between glottal articulation and consonant deletion, and active nasal fricatives and accompanying nasal emission/turbulence is made. The modified PCC score was reliable in this study and further work is required to examine how best to report the resolution of articulation difficulties in cleft intervention studies.

We wish to acknowledge Myra O’Regan, Professor, School of Computer Science and Statistics, Trinity College, Dublin for advice and statistical data analysis; Liane Deasy (RIP) PLAT Research SLT, Dublin, for her work in the preparation of the data for this reliability study and Katie Powell, PLAT Research SLT, London.

Research at Great Ormond Street Hospital NHS Foundation Trust is supported by the NIHR. The views expressed are those of the authors and not necessarily those of the NHS, the NIHR or the Department of Health.

All parents of the children whose data were used in this study gave their written informed consent. The study protocol was approved by the Great Ormond Street Hospital/UCL ICH Joint Research and Development Office. Ethical approval reference 15/LO/1909. In Dublin, ethical approval was received from Our Lady’s Hospital for Sick Children (Gen/446/15), Temple Street Children’s University Hospital (15.054), and Trinity College Dublin, (Parent Led Articulation Therapy, Sweeney 15/16).

The authors have no conflicts of interest to declare.

We wish to thank the funders of the randomized controlled trial including the Cleft Lip and Palate Association Ireland, National Children’s Research Centre, Temple Street Foundation, and Cleft – Bridging the Gap. Data from the trial were used in the preparation of the manuscript.

Each author made equal contributions to the conception and design of the work, the acquisition, analysis and interpretation of data and the drafting and revising of this manuscript.

1.
Sell
D
,
Grunwell
P
,
Mildinhall
S
,
Murphy
T
,
Cornish
TA
,
Bearn
D
, et al
Cleft lip and palate care in the United Kingdom—the Clinical Standards Advisory Group (CSAG) Study. Part 3: speech outcomes
.
Cleft Palate Craniofac J
.
2001
Jan
;
38
(
1
):
30
7
.
[PubMed]
1055-6656
2.
Chapman
KL
.
The relationship between early reading skills and speech and language performance in young children with cleft lip and palate
.
Cleft Palate Craniofac J
.
2011
May
;
48
(
3
):
301
11
.
[PubMed]
1055-6656
3.
McCormack
J
,
McLeod
S
,
McAllister
L
,
Harrison
LJ
.
A systematic review of the association between childhood speech impairment and participation across the lifespan
.
Int J Speech Lang Pathol
.
2009
Jan
;
11
(
2
):
155
70
. 1754-9507
4.
Sell
D
,
Mildinhall
S
,
Albery
L
,
Wills
AK
,
Sandy
JR
,
Ness
AR
.
The Cleft Care UK study. Part 4: perceptual speech outcomes
.
Orthod Craniofac Res
.
2015
Nov
;
18
Suppl 2
:
36
46
.
[PubMed]
1601-6335
5.
Lohmander
A
. Surgical intervention and speech outcomes in cleft lip and palate. In:
Howard
S
,
Lohmander
A
, editors
.
Cleft Palate Speech: Assessment and Intervention
.
Chichester
:
Wiley‐Blackwell
;
2011
. pp.
55
85
.
6.
Bessell
A
,
Sell
D
,
Whiting
P
,
Roulstone
S
,
Albery
L
,
Persson
M
, et al
Speech and language therapy interventions for children with cleft palate: a systematic review
.
Cleft Palate Craniofac J
.
2013
Jan
;
50
(
1
):
e1
17
.
[PubMed]
1055-6656
7.
Meinusch
M
,
Neumann
S
.
Speech and language therapy interventions for children with cleft palate: evidence not proven
.
Evid Based Commun Assess Interv
.
2016
Oct
;
10
(
3-4
):
155
61
. 1748-9539
8.
Sell
D
,
Harding
A
,
Grunwell
P
.
GOS.SP.ASS.’98: an assessment for speech disorders associated with cleft palate and/or velopharyngeal dysfunction (revised)
.
Int J Lang Commun Disord
.
1999
Jan-Mar
;
34
(
1
):
17
33
.
[PubMed]
1368-2822
9.
John
A
,
Sell
D
,
Sweeney
T
,
Harding-Bell
A
,
Williams
A
.
The cleft audit protocol for speech-augmented: A validated and reliable measure for auditing cleft speech
.
Cleft Palate Craniofac J
.
2006
May
;
43
(
3
):
272
88
.
[PubMed]
1055-6656
10.
Sell
D
.
Issues in perceptual speech analysis in cleft palate and related disorders: a review
.
Int J Lang Commun Disord
.
2005
Apr-Jun
;
40
(
2
):
103
21
.
[PubMed]
1368-2822
11.
Lohmander
A
,
Willadsen
E
,
Persson
C
,
Henningsson
G
,
Bowden
M
,
Hutters
B
.
Methodology for speech assessment in the Scandcleft project—an international randomized clinical trial on palatal surgery: experiences from a pilot study
.
Cleft Palate Craniofac J
.
2009
Jul
;
46
(
4
):
347
62
.
[PubMed]
1055-6656
12.
Howard
S
. Phonetic transcription for speech related to cleft palate. In:
Howard
S
,
Lohmander
A
, editors
.
Cleft Palate Speech: Assessment and Intervention
.
Chichester
:
Wiley‐Blackwell
;
2011
. pp.
127
42
.
13.
Gooch
JL
,
Hardin-Jones
M
,
Chapman
KL
,
Trost-Cardamone
JE
,
Sussman
J
.
Reliability of listener transcriptions of compensatory articulations
.
Cleft Palate Craniofac J
.
2001
Jan
;
38
(
1
):
59
67
.
[PubMed]
1055-6656
14.
Chapman
KL
,
Baylis
A
,
Trost-Cardamone
J
,
Cordero
KN
,
Dixon
A
,
Dobbelsteyn
C
, et al
The Americleft Speech Project: a training and reliability study
.
Cleft Palate Craniofac J
.
2016
Jan
;
53
(
1
):
93
108
.
[PubMed]
1055-6656
15.
Philips
BJ
,
Bzoch
KR
.
Reliability of judgments of articulation of cleft palate speakers
.
Cleft Palate J
.
1969
Jan
;
6
(
1
):
24
34
.
[PubMed]
0009-8701
16.
McWilliams
BJ
,
Morris
H
,
Shelton
R
.
Cleft Palate Speech
. 2nd ed.
Philadelphia
:
BC Decker Inc
;
1990
.
17.
Peterson-Falzone
S
,
Hardin-Jones
MA
,
Karnell
MP
.
(2010) Cleft Palate Speech. Pub
.
Mosby Inc.
;
2010
.
18.
Kuehn
DP
,
Moller
KT
.
Speech and language issues in the cleft palate population: the state of the art
.
Cleft Palate Craniofac J
.
2000
Jul
;
37
(
4
):
1
35
. 1055-6656
19.
Brunnegård
K
,
Lohmander
A
.
A cross-sectional study of speech in 10-year-old children with cleft palate: results and issues of rater reliability
.
Cleft Palate Craniofac J
.
2007
Jan
;
44
(
1
):
33
44
.
[PubMed]
1055-6656
20.
Kent
RD
.
Hearing and believing: some limits to the auditory-perceptual assessment of speech and voice disorders.
.
1996
Aug;5(3):7–23
21.
Shriberg
LD
,
Lof
GL
.
Reliability studies in broad and narrow phonetic transcription
.
Clin Linguist Phon
.
1991
Jan
;
5
(
3
):
225
79
. 0269-9206
22.
Lohmander
A
,
Persson
C
,
Willadsen
E
,
Lundeborg
I
,
Alaluusua
S
,
Aukner
R
, et al
Scandcleft randomised trials of primary surgery for unilateral cleft lip and palate: 4. Speech outcomes in 5-year-olds - velopharyngeal competency and hypernasality
.
J Plast Surg Hand Surg
.
2017
Feb
;
51
(
1
):
27
37
.
[PubMed]
2000-656X
23.
Willadsen
E
,
Lohmander
A
,
Persson
C
,
Lundeborg
I
,
Alaluusua
S
,
Aukner
R
, et al
Scandcleft randomised trials of primary surgery for unilateral cleft lip and palate: 5. Speech outcomes in 5-year-olds - consonant proficiency and errors
.
J Plast Surg Hand Surg
.
2017
Feb
;
51
(
1
):
38
51
.
[PubMed]
2000-656X
24.
Klintö
K
,
Salameh
EK
,
Svensson
H
,
Lohmander
A
.
The impact of speech material on speech judgement in children with and without cleft palate
25.
Shriberg
LD
,
Kwiatkowski
J
.
Phonological disorders III: a procedure for assessing severity of involvement
.
J Speech Hear Disord
.
1982
Aug
;
47
(
3
):
256
70
.
[PubMed]
0022-4677
26.
Cucchiarini
C
.
Assessing transcription agreement: methodological aspects
.
Clin Linguist Phon
.
1996
Jan
;
10
(
2
):
131
55
. 0269-9206
27.
Shriberg
LD
,
Austin
D
,
Lewis
BA
,
McSweeny
JL
,
Wilson
DL
.
The percentage of consonants correct (PCC) metric: extensions and reliability data
.
J Speech Lang Hear Res
.
1997
Aug
;
40
(
4
):
708
22
.
[PubMed]
1092-4388
28.
Morris
H
,
Ozanne
A
.
Phonetic, phonological, and language skills of children with a cleft palate
.
Cleft Palate Craniofac J
.
2003
Sep
;
40
(
5
):
460
70
.
[PubMed]
1055-6656
29.
Chapman
KL
,
Hardin-Jones
MA
,
Goldstein
JA
,
Halter
KA
,
Havlik
RJ
,
Schulte
J
.
Timing of palatal surgery and speech outcome
.
Cleft Palate Craniofac J
.
2008
May
;
45
(
3
):
297
308
.
[PubMed]
1055-6656
30.
Lohmander
A
,
Persson
C
.
A longitudinal study of speech production in Swedish children with unilateral cleft lip and palate and two-stage palatal repair
.
Cleft Palate Craniofac J
.
2008
Jan
;
45
(
1
):
32
41
.
[PubMed]
1055-6656
31.
Scherer
NJ
,
D’Antonio
LL
,
McGahey
H
.
Early intervention for speech impairment in children with cleft palate
.
Cleft Palate Craniofac J
.
2008
Jan
;
45
(
1
):
18
31
.
[PubMed]
1055-6656
32.
Willadsen
E
,
Poulsen
M
.
A restricted test of single-word intelligibility in 3-year-old children with and without cleft palate
.
Cleft Palate Craniofac J
.
2012
May
;
49
(
3
):
e6
16
.
[PubMed]
1055-6656
33.
Klintö
K
,
Svensson
H
,
Elander
A
,
Lohmander
A
.
Speech and phonology in Swedish-speaking 3-year-olds with unilateral complete cleft lip and palate following different methods for primary palatal surgery
.
Cleft Palate Craniofac J
.
2014
May
;
51
(
3
):
274
82
.
[PubMed]
1055-6656
34.
Malmborn
JO
,
Becker
M
,
Klintö
K
.
Problems With Reliability of Speech Variables for Use in Quality Registries for Cleft Lip and Palate-Experiences From the Swedish Cleft Lip and Palate Registry
.
Cleft Palate Craniofac J
.
2018
Jan
;
55
(
8
):
1055665618765777
.
[PubMed]
1055-6656
35.
Klintö
K
,
Falk
E
,
Wilhelmsson
S
,
Schönmeyr
B
,
Becker
M.
Speech in 5-Year-Olds With Cleft Palate With or Without Cleft Lip Treated With Primary Palatal Surgery With Muscle Reconstruction According to Sommerlad
36.
Sweeney
T
,
Sell
D
,
Hegarty
F
.
Parent led articulation therapy in cleft palate speech: a feasibility study
.
J Clinical Speech Lang Stud.
2016
;
23
:
21
41
.
37.
Sweeney
T
,
Hegarty
F
,
O’Regan
M
,
Powell
K.
Deasy L. (RIP) Sell D. A randomised controlled trial comparing Parent Led therapist supervised Articulation Therapy (PLAT) with standard intervention for children with cleft palate speech disorder. In preparation.
38.
Sell
D
,
John
A
,
Harding-Bell
A
,
Sweeney
T
,
Hegarty
F
,
Freeman
J
. Cleft audit protocol for speech (CAPS-A): a comprehensive training package for speech analysis. .
2009
Jul-Aug;44(4):529–48
39.
Britton
L
,
Albery
L
,
Bowden
M
,
Harding-Bell
A
,
Phippen
G
,
Sell
D
.
A cross-sectional cohort study of speech in five-year-olds with cleft palate ± lip to support development of national audit standards: benchmarking speech standards in the United Kingdom
.
Cleft Palate Craniofac J
.
2014
Jul
;
51
(
4
):
431
51
.
[PubMed]
1055-6656
40.
Cleft Registry and Audit Network (CRANE) annual report.
2018
. www.crane-database.org.uk
41.
Dodd
B
,
Zhu
H
,
Crosbie
S
,
Holm
A
,
Ozanne
A
.
Diagnostic evaluation of articulation and phonology (DEAP)
.
Psychology Corporation
;
2002
.
42.
Renfrew
C
.
The Action Picture Test
. 4th ed.
Oxford
:
Speechmark
;
2003
.
43.
Altman
DG
.
Practical Statistics for Medical Research
.
London
:
Chapman and Hall/CRC Press
;
1991
. p.
404
.
44.
Shriberg
LD
,
Kwiatkowski
J
,
Hoffmann
K
.
A procedure for phonetic transcription by consensus
.
J Speech Hear Res
.
1984
Sep
;
27
(
3
):
456
65
.
[PubMed]
0022-4685
45.
Ahl
R
,
Harding-Bell
A
.
Comparing methodologies in a series of speech outcome studies: challenges and lessons learned
.
Cleft Palate Craniofac J
.
2018
Jan
;
55
(
1
):
35
44
. 1055-6656
46.
Lohmander
A
,
Lundeborg
I
,
Persson
C
.
SVANTE - The Swedish Articulation and Nasality Test - Normative data and a minimum standard set for cross-linguistic comparison
.
Clin Linguist Phon
.
2017
;
31
(
2
):
137
54
.
[PubMed]
0269-9206
47.
Grunwell
P
. (
1982
).
Clinical Phonology.
Pub. Croom Helm pub. London.
48.
Dodd
B
,
Holm
A
,
Hua
Z
,
Crosbie
S
.
Phonological development: a normative study of British English-speaking children
.
Clin Linguist Phon
.
2003
Dec
;
17
(
8
):
617
43
.
[PubMed]
0269-9206
49.
Hall
R
,
Adams
C
,
Hesketh
A
,
Nightingale
K
.
The measurement of intervention effects in developmental phonological disorders
.
Int J Lang Commun Disord
.
1998
;
33
(
S1
Suppl
):
445
50
.
[PubMed]
1368-2822
50.
Hesketh
A
,
Adams
C
,
Nightingale
C
,
Hall
R
.
Phonological awareness therapy and articulatory training approaches for children with phonological disorders: a comparative outcome study
.
Int J Lang Commun Disord
.
2000
Jul-Sep
;
35
(
3
):
337
54
.
[PubMed]
1368-2822
51.
Dobbelsteyn
C
,
Bird
EK
,
Parker
J
,
Griffiths
C
,
Budden
A
,
Flood
K
, et al
Effectiveness of the corrective babbling speech treatment program for children with a history of cleft palate or velopharyngeal dysfunction
.
Cleft Palate Craniofac J
.
2014
Mar
;
51
(
2
):
129
44
.
[PubMed]
1055-6656
52.
Derakhshandeh
F
,
Nikmaram
M
,
Hosseinabad
HH
,
Memarzadeh
M
,
Taheri
M
,
Omrani
M
, et al
Speech characteristics after articulation therapy in children with cleft palate and velopharyngeal dysfunction - A single case experimental design
.
Int J Pediatr Otorhinolaryngol
.
2016
Jul
;
86
:
104
13
.
[PubMed]
0165-5876
53.
Holm
A
,
Crosbie
S
,
Dodd
B
.
Differentiating normal variability from inconsistency in children’s speech: normative data
.
Int J Lang Commun Disord
.
2007
Jul-Aug
;
42
(
4
):
467
86
.
[PubMed]
1368-2822
54.
Stoel-Gammon
C
.
Phonological skills of 2-year-olds
.
Lang Speech Hear Serv Sch
.
1987
Oct
;
18
(
4
):
323
9
. 0161-1461
Copyright / Drug Dosage / Disclaimer
Copyright: All rights reserved. No part of this publication may be translated into other languages, reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying, recording, microcopying, or by any information storage and retrieval system, without permission in writing from the publisher.
Drug Dosage: The authors and the publisher have exerted every effort to ensure that drug selection and dosage set forth in this text are in accord with current recommendations and practice at the time of publication. However, in view of ongoing research, changes in government regulations, and the constant flow of information relating to drug therapy and drug reactions, the reader is urged to check the package insert for each drug for any changes in indications and dosage and for added warnings and precautions. This is particularly important when the recommended agent is a new and/or infrequently employed drug.
Disclaimer: The statements, opinions and data contained in this publication are solely those of the individual authors and contributors and not of the publishers and the editor(s). The appearance of advertisements or/and product references in the publication is not a warranty, endorsement, or approval of the products or services advertised or of their effectiveness, quality or safety. The publisher and the editor(s) disclaim responsibility for any injury to persons or property resulting from any ideas, methods, instructions or products referred to in the content or advertisements.