Abstract
Background/Aims: Percent consonant correct (PCC) was originally described by Shriberg and Kwiatkowski [J Speech Hear Disord. 1982 Aug;47(3):256–70] as a severity metric for phonological speech disorders, and has been adapted and used in many studies on speech sound disorders. It is well-recognized that cleft speech is complex, consisting of several interacting parameters assessed simultaneously, with error sounds not in the listener’s own language. In speech outcome studies, narrow phonetic transcription and the reporting of intra- and inter-rater reliability are acknowledged as the gold standard. However, cleft speech brings special challenges to this task, as complex speech disorders are known to be associated with low transcriber agreement. Recent studies informed the decision to use PCC as the primary outcome measure in a cleft speech intervention study, given its common usage and familiarity. The aim was to specifically evaluate the intra- and inter-rater reliability of PCC in an intervention study, in contrast to other types of speech outcome studies. Methods: Two trained and experienced listeners analyzed 119 recordings, randomly selected from five data points before, during, and following intervention. The PCC score was separately calculated for words and sentences/phrases. Results: Using intraclass correlations (ICCs), Phase 1 results showed poor reliability for targets elicited for words (ICC = 0.07) and sentences/phrases (ICC = 0.42). Differences in classification of errors as glottal stops and consonant deletion accounted for this. Following further training, a second reliability study was undertaken showing improvement in the number of targets elicited in words (ICC = 0.85) and sentences/phrases (ICC = 0.94). There was very good inter-rater reliability for the PCC score on the word dataset (ICC = 0.9) and the sentence dataset (ICC = 0.88). Very good intra-rater reliability (ICC = 1.0) was found for the PCC score in both words and sentences/phrases for each listener. One listener consistently gave higher modified PCC scores. Conclusions: In cleft speech intervention studies, reliability of the number of targets elicited should be reported. Listeners need to distinguish between glottal articulation and consonant deletion, in order that the PCC score is meaningful. Attention should be paid to where listeners are reliable, but their pattern of scores consistently differs but in a consistent way. More research is needed on measuring the resolution of articulation difficulties in cleft intervention studies.
Introduction
Children born with cleft palate ± lip are known to be at high risk of speech difficulties, following palate repair in infancy, with potential educational and psychosocial consequences [1-4]. The need to determine the optimum timing, staging, and technique of palate repair in terms of speech outcomes has been at the center of cleft research for decades [5]. As a consequence, much of the focus in the cleft literature on speech outcomes has been on reports of surgical outcomes, with much less attention to therapy intervention studies in cleft speech. A systematic review by Bessell et al. [6] recommended more randomized controlled trials and observational studies, together with the need to identify elements of speech therapy interventions that may be effective, including duration, age-range, setting, intensity, delivery, and theoretical perspective of the intervention method. Meinusch and Neumann [7] subsequently commented on the review and stated how it could be used as a starting point for developing well-designed and well-reported intervention -studies.
It is well-recognized that the nature of cleft speech is complex and different from other speech disorders in a number of ways. Unlike developmental disorders, cleft speech disorders can be due to structural etiologies described as passive characteristics [8, 9] or due to active mislearning, sometimes associated with a history of VPI, which frequently co-occur. Cleft speech consists not only of consonant sound errors, known as cleft speech characteristics (CSCs) [8, 9], but also resonance (hypernasality, hyponasality) and nasal airflow errors (audible nasal emission, nasal turbulence), all key aspects of an assessment. CSCs describe the typical types of errors heard in cleft speech, and include sounds not in the English language. They largely reflect errors of place of articulation, frequently articulated further back in the oral cavity, for example, alveolar targets produced as palatals or velars or even further back in the larynx/pharynx/velopharynx as glottal stops, pharyngeal fricatives or active nasal fricatives, respectively. One group of CSCs describes typical passive errors, usually of manner. The different cleft speech parameters have to be assessed simultaneously, sometimes interacting with each other, further complicating the perceptual assessment. Narrow phonetic transcription is acknowledged as the gold standard for this purpose[10-12]. It is also widely accepted that listeners require additional postgraduate specialist training to analyze these often complex speech presentations. In speech outcome studies, intra- and inter-rater reliability must be reported [9-11, 13-19]. This is now accepted as a requirement of speech outcome studies in cleft palate, irrespective of the nature of the study. However, cleft speech brings special challenges to this task. Complex speech disorders are known to be associated with low transcriber agreement [20, 21]. Gooch et al. [13] drew attention to the fact that there are two tasks involved: identifying the error and recalling its transcription. Cleft studies have described training in phonetic transcription and concluded that with agreed conventions and rules and regular transcription tests and training [22, 23], it is possible to achieve good consistency in phonetic transcription and that ratings can be reliably performed [9, 11, 14, 22, 23]. Chapman et al. [14] listed the reasons which contribute to low intra- and inter-rater reliability in cleft speech studies as the framework of analysis, the nature of the speech sample, procedures used for data collection, recording quality, playback conditions, and listener characteristics such as level of experience and the training undertaken. Klintö et al. [24] also added the severity of the speech disorder, quantity of material re-transcribed, time between original transcription and re-transcription, and methods used for calculation and criteria of agreement as other contributing factors.
In this regard, different statistical approaches for calculating reliability have included point-to-point percentage agreement, correlations, kappas/weighted kappas, and single or average measure intraclass correlation (ICC) coefficients. There are advantages and disadvantages to these approaches. The advantage of point-to-point percentage of agreement is that this is an estimate of the reliability of the data at the level of each sound, word, or other unit of analysis [25]. When calculating percent agreement between listeners for percent consonant correct (PCC), it is usually easier to achieve high agreement since there are only two possible options for every consonant – correct or incorrect. However, the risk of chance agreement is high [26]. Reliability coefficients take chance agreements into account, while providing information about the ability of scores to distinguish between subjects, and reporting group outcomes. However, where there is a lack of variability in the range of severity in a dataset such that the whole scale is not used, these statistics can be misleading [11, 14, 19].
Notwithstanding, the PCC is widely used as the clinical and research metric in non-cleft speech outcome studies. It was originally defined as a measure of the proportion of correctly articulated consonants in the transcription of conversational speech representing the severity of involvement related to phonological disorders [25]. It is a measure of both phonetic/articulatory and phonological errors, in which all errors are given the same weight. Shriberg et al. [27] introduced the percentage consonant correct – adapted (PCC-A) and revised (PCC-R) that allow common, and common together with uncommon distortions (PCC-A and PCC-R respectively), the latter including nasal emission on consonants. Since then, further modified measures of PCC have been reported in several cleft speech studies [28-33]. For example, authors have reported PCC together with additional analyses including percent correct manner, and percent correct place [30] and percentage of compensatory articulation related to place and manner [29]. Klintö et al. [33] defined PCC-A in which age-appropriate simplifications of /s/ and passive CSCs (audible nasal air emission, nasal realizations, and weak articulation) are scored as correct. Subsequently, percent oral consonants correct (POCC), percent oral errors, and percent non-oral errors have been used [23, 34, 35]. The POCC is a measure in which audible nasal air leakage/emission, weak articulation, and nasalization of consonants are scored as incorrect together with phonological and articulatory errors. This would appear to be the original description of PCC.
Given the widespread and continued use of PCC in cleft palate speech studies, especially in the evaluation of speech outcomes related to surgery, and considering its limitations, the purpose of the present study was to evaluate the reliability of PCC as an outcome measure for cleft speech in an intervention study.
Method
This study was undertaken as part of a randomized controlled trial which aimed to compare if Parent Led, therapist-supervised Articulation Therapy (PLAT) intervention was comparable to traditional speech therapy intervention in children with cleft palate-related speech disorders [36, 37].
Participants
Speech and language therapists (SLTs), who had been trained on the Cleft Audit Protocol for Speech-Augmented (CAPS-A), were invited to serve as raters/listeners through an email invitation to the Cleft centers in the UK and Ireland. The CAPS-A is a validated outcome tool, widely used in the UK and Ireland for audit and research outcome studies [9], with an associated two-day training program [38]; https://www.caps-a.com. SLTs sent an expression of interest and two independent SLTs highly experienced in the analysis and rating of cleft palate speech in research projects were invited to participate. Both had participated in the Scandcleft [22, 23] and Timing of Primary Palatal Surgery trials (http://www.tops-trial.org.uk/ accessed November 12, 2018). These are two large, international randomized controlled trials in centers in the UK, Scandinavia, and Brazil investigating the timing, staging, and techniques of palate surgery, in which speech is one of the primary outcome measures. Reliability data of two listeners were accessed from the Timing of Primary Palatal Surgery trial with the listeners’ permission. In addition, each listener had regularly participated in consensus listening activities for mandatory national audit in their UK workplace [11, 39, 40].
The two listeners attended a one-day training course. The age range of the study cohort, the types of speech characteristics, and the speech sample were all discussed. Guidelines were given on the identification of correct versus incorrect targets, the analysis of single words and sentences/phrases, the identification of phonological processes, the analysis of resonance, audible nasal emission and nasal turbulence, and the completion of the independent listener analysis form (ILAF; online suppl. material; for all online suppl. material, see www.karger.com/doi/10.1159/000501095). Following this, four cases, not included in the study, were each analyzed and rated by the listeners and authors (D.S., T.S.), and consensus judgments were agreed.
Data Collection
Forty-four children (35 boys, 9 girls, with a mean age of 4.78 years; SD of 0.89 and age range of 2.9–7.6 years) had been included in the intervention study. High quality audio video recordings had been undertaken on 5 occasions over a 7-month period; at baseline (time point 1), immediately pre-intervention (time point 2), midway through intervention (time point 3), immediately post-intervention (time point 4), and 2 months post-intervention (time point 5). The duration of the intervention was 12 weeks, that is, between time points 2 and 4. Recordings were made either using the Zoom Q3 or Q4 HD cameras mounted on a desktop stand [38].
The recorded speech sample consisted of pictures of 30 single words from the articulation section of the Diagnostic Evaluation of Articulation and Phonology test [41], and pictures of 13 sentences/phrases from the CAPS-A recommended speech sample [8, 38]. In addition, a sound stimulability task and a spontaneous speech sample based on the Renfrew Action Picture Test [42] were collected but not recorded.
Data Preparation
Video data were collected at time points 1, 2, 4, and 5 for all 44 cases. At time point 3 video data were missing on six cases due to examiner error. Hence, 214 video recordings were anonymized and randomized using the Excel Rand function. Live assessments were initially carried out by the research SLTs and co-authors during all assessment time points. PCC was calculated for these assessments to ascertain the severity of speech disorder and to plan ongoing therapy. In preparation for randomization, all cases were ordered according to their live PCC ratings. Systematic sampling was undertaken, ensuring the range of severity of speech was represented. Seventy videos were randomly selected; however, during analyses it was noted that one recording had been lost and hence 69 videos were used. All video recordings were saved to an encrypted external hard drive. In total, 119 video edits were included in the two-phase reliability study.
Perceptual Assessments
The two listeners were each given a pair of Sennheiser 90 high-quality headphones and an encrypted hard drive containing 69 anonymized video recordings with time point and trial arm unknown to the listeners. They included all the CSCs [8, 9], different time points with the range of severity of PCC in words and sentences/phrases represented.
The word sample consisted of 40 consonant targets, and included a single target for each English sound in word initial and word final position. The sentence/phrase sample consisted of 13 high-pressure consonant targets (plosives, fricatives, and affricates) in word initial and word final position. The listeners transcribed the targets using narrow phonetic transcription.
Data Analysis
A modified PCC was adopted as the primary outcome measure [27]. Consonants produced with correct place, manner, and voice but with accompanying nasal emission/turbulence, weak/nasalized realizations, and dental/interdental quality were categorized as correct. PCC was calculated by the number of targets elicited divided by the number of targets correct, multiplied by 100. Developmental errors were included in the calculation. PCC scores were separately calculated for the word and sentence/phrase samples, and intraclass correlations were used to evaluate the inter- and intra-rater reliability of the two listeners for each dataset in both Phases 1 and 2. The strength of agreement for the ICCs was based on Altman [43]: 0.00–0.20 poor, 0.21–0.40 fair, 0.41–0.60 moderate, 0.61–0.80 good, 0.81–1.00 very good.
Phase 1 Method
Analysis was carried out over a 3-week period. For each video edit, each listener analyzed the speech sample, completed the ILAF and inputted the data into an Excel spreadsheet. They entered the recording edit ID, the number of targets elicited, the number of targets correct, and coded the speech errors as CSCs as per the CAPS-A protocol. In classifying speech errors, the listener recorded the number of consonants affected and listed the consonants affected for each of the following CSCs: lateral, palatal, double articulation, backed to velar/uvular, pharyngeal, glottal, active nasal fricative, nasal realizations, and gliding. There was also an option to mark “other” for unusual cleft speech errors not listed above. Phonological processes were rated as present or absent, independent of the age of the child. Listeners also rated hypernasality, audible nasal emission, and nasal turbulence using the CAPS-A scales shown in the online supplementary material. The PCC scores for the word and sentence/phrase samples were automatically calculated using Excel for each speech sample.
Phase 1 Results
Table 1 showed unacceptably poor inter-rater reliability on the number of targets elicited at both word (ICC = 0.07; 95% CI –0.17 to 0.3) and sentence level (ICC = 0.42; 95% CI 0.2–0.59) for each dataset in Phase 1. In contrast, there was very good inter-rater reliability for the number of targets correct at both word (ICC = 0.84; 95% CI 0.75–0.89) and sentence/phrase level (ICC = 0.82; 95% CI 0.72–0.88) and PCC (ICC word = 0.80; 95% CI 0.70–0.87 and sentence/phrase = 0.79; 95% CI 0.69–0.87).
Table 2 shows the variability in the number of targets elicited and the effect on the derived PCC for the word data analysis of each listener for four children. In video edits 1 and 2, although the number of targets correct is very similar for both listeners, the number of targets elicited differs by at least 10 points with resultant large differences in the derived PCC score. This contrasts with video edit 4 in which the number of targets elicited is the same for both listeners (n = 40); the number of targets correct differs between the listeners and is reflected in the differing modified PCC scores. In video edit 8, listeners differ not only in the number of targets elicited but also in the number of targets judged correct by over 10 points with a large difference of 17% between the 2 PCC scores.
Phase 2 Method
The raw data on the ILAF (online suppl. material), phonetic transcriptions, and Excel spread sheets were examined by the authors to identify the cause of the discrepancies in the number of targets elicited between the two listeners. One striking issue was in the transcription of glottal stops and consonant deletion. A pattern emerged of one listener transcribing a target as a glottal stop and the other transcribing it as a consonant deletion. The glottal stop was counted as a “target elicited” in contrast to a consonant deletion, accounting for the differences in number of targets elicited between the listeners.
Although not affecting the number of targets elicited, other sources of disagreement between listeners were noted which could have significant implications for the results. These involved transcription differences, where for example, active nasal fricatives were sometimes transcribed as nasal realizations or as correct targets with accompanying nasal emission/turbulence. Interestingly, Willadsen et al. [23] reported a similar issue and in their study they accepted these as agreements. In a cleft intervention study, where targets for therapy are often active nasal fricatives, it was critical that the listeners were accurate in the perception and transcription of these errors.
Given these findings, a further training course was undertaken. A half-day session was completed over Skype, reviewing the disagreements outlined above. It was agreed that if the child attempted to say the word, the target should be included as elicited even if the transcription was a consonant deletion. Further listening practice focused on contrasting glottal stops and consonant deletion errors. Other issues regarding the perception and transcription of active nasal fricatives (i.e., active mislearning) versus consonants with accompanying nasal emission/turbulence, and weak versus unreleased targets, nasal realizations and nasalization of plosives (i.e., passive structurally related errors) were also discussed and agreed. Following the second training session, a different 70 video edits, 20 of which were duplicated for the intra-rater study, were analyzed. The 20 duplicated edits were randomly selected and inserted in the series of video edits, ensuring there were at least a minimum of five videos before the duplicate was repeated. As discussed previously, the PCC scores derived from the face to face assessment was used to ensure that the full range of CSCs and speech severity was included in Phase 2. Phase 2 analysis took place over the course of 4 weeks.
Phase 2 Results
Inter-rater reliability for the number of targets elicited at both word (ICC = 0.85; 95% CI 0.74–0.91) and sentence/phrase level (ICC = 0.94; 95% CI 0.9–0.97) and the number of targets correct at both word (ICC = 0.91; 95% CI 0.85–0.95) and sentence/phrase level (ICC = 0.89; 95% CI 0.81–0.94) were very good. Inter-rater reliability for the modified PCC scores was very good on the word dataset (ICC = 0.9; 95% CI 0.84–0.94) and the sentence/phrase dataset (ICC = 0.88; 95% CI 0.8–0.93). ICCs indicated very good intra-rater reliability (ICC = 1.0; 95% CI 0.98–1.00) for modified PCC scores for targets in both words and sentences/phrases for each listener.
Discussion
The aim of this study was to evaluate the intra- and inter-rater reliability of the modified PCC scores in an intervention study of children with a cleft palate-related speech disorder. The reliability study was conducted using a sample of 119 video recordings, collected longitudinally at baseline, immediately pre-intervention, midway through the intervention, immediately post-intervention, and two months post-intervention.
The study formed part of an intervention study, in which there were 214 video recordings requiring analysis. It is known that phonetic analysis and ratings is a time-consuming process [44, 45]. The task involved the transcription of 40 targets in a corpus of 26 words, and 26 targets based on 13 sentences/phrases. The listener was then asked to sum the total targets elicited and the total targets correct for each dataset, and the PCC was automatically calculated. This was followed by a task of categorization of errors into their CSC category, with a calculation of the number of targets affected. Finally, the listener noted the presence of any phonological processes, and rated resonance and nasal airflow. On average, it takes between 20 and 40 min depending on the complexity of speech. The original plan had been for each listener to analyze 107 videos, to ensure an equal distribution of the workload and reduce the possible introduction of errors from listener fatigue. Inter-rater reliability was to have been based on 70 video edits. The same 70 edits (i.e., 30% of the whole recording archive) would be re-randomized and analyzed a second time to evaluate intra-rater reliability.
However, initial results indicated unacceptably poor agreement on the number of targets elicited both in words and sentences/phrases and so further training and reliability testing was necessary. It was difficult to ascertain from the data alone if this was a problem of perception or classification. In Phase 2, 70 video edits, including 20 duplicated edits, were analyzed by the two listeners. Results of this phase showed very good inter-rater agreement on the number of targets elicited, number of targets correct at word and sentence/phrase level, and modified PCC scores. Intra-rater reliability was very good for both listeners.
Although reliability was very good, one listener consistently gave higher PCC scores than the other. Whilst this may not be as important in some types of cleft studies and may not be an issue for speech disorders with other etiologies, this difference is critical in cleft intervention outcome studies, as results could be affected by the listener and not speech performance. For example, if the more generous listener analyzed a pre-intervention edit and the more critical listener analyzed the post-intervention edit, any positive change in PCC scores may not be reflected in the data, when a real change had taken place. Conversely, if the more critical listener analyzed a pre-intervention edit and the more generous listener analyzed the post-intervention edit, the positive change in PCC scores is at risk of being over-estimated. This had major implications for the original plan of dividing the video archive equally between the two listeners for analysis. Thus, one listener only (listener A) analyzed the remaining 164 video edits.
This study showed that despite poor inter-rater reliability for targets elicited (ICC = 0.07 in words and 0.42 in sentences/phrases), intriguingly there was very good and good inter-rater reliability for the modified PCC score (0.80 in words and 0.79 in sentences/phrases). The extent to which others have also had this problem is unknown as usually reliability data on the number of targets elicited is not reported. Furthermore, to what extent is a lack of agreement on the number of targets elicited a concern? Our interpretation is that if the number of targets elicited varies considerably between transcribers, the PCC score will be calculated on a different number of consonants which potentially may give very misleading results. This was well illustrated by comparing the listeners’ findings on the word sample of four children, as described above. It does not make intuitive sense for the number of targets correct to be judged as fairly similar for both listeners, and yet the number of targets elicited to differ considerably with large differences in the derived PCC score. In contrast, where the number of targets elicited is the same or very similar for both listeners but the number of targets correct differs, this does make sense, and suggests a difference in the perception and/or transcriptions of the targets between the listeners. Where there are large differences between listeners in both the number of targets elicited and the number of targets correct, with the impact of very different derived PCC scores, it becomes harder to have confidence in the scores, and underlines the importance of ensuring the denominator of the PCC calculation is reliable.
This study has also illustrated the challenges of distinguishing between speech characteristics which are related to the cleft condition and developmental speech immaturities, often required in cleft surgical outcome studies in order that the impact of the structural anomaly is reported accurately, and indeed speech is managed appropriately. In this study, there were differences in the perception and transcription of glottal stops, a CSC in the non-oral category of errors [8, 9, 23, 46], and consonant deletion, which when occurring in word final position is interpreted as a developmental immaturity [47, 48] and when in word initial position is interpreted as a phonological disorder. In the Scandcleft study, Willladsen et al. [23] reported poor point-by-point agreement in a word naming test for phonetic transcription of several active and passive CSCs. To increase the rater agreement, they identified “minor disagreements” and accepted them as agreements, and one of these was glottal stops and consonant deletion. Whereas this may not have a major impact in speech outcome studies informing the timing and nature of primary surgery where there is a large sample size, it has very significant implications in a cleft speech intervention study. For example, glottal stops are a CSC and are considered serious errors of articulation which can be much more challenging to eradicate compared to consonant deletion. Furthermore, the therapy approach for glottal realizations differs considerably to that of a developmental speech disorder. Therefore, data categorization and reduction may need to be adjusted according to the nature of the study.
Although the modified PCC score was reliable in our study, it appears to have significant limitations in an intervention study. Indeed, it was originally developed as a measure of severity and not of change, so arguably this limitation should not be surprising. It does not reflect improvements where the target sound has changed in a positive direction but is still not classifiable as “correct” [49]. In our study, this was illustrated by one child who had an active nasal fricative for /ʃ, tʃ, ʤ/ pre-intervention, and used an oral /s, tj, dj/ post-intervention. Although this child had unquestionably made good progress towards resolving her speech difficulty, her consonants were not correct and the change was therefore not captured in the modified PCC outcome. Analysis using a Probe score [49, 50] may help to capture the way in which speech errors change with therapy. The Probe score works by comparing the child’s production of a sound to the adult target and applying a scoring system to each feature (place, manner and voice) that is incorrect. This could be adapted for cleft speech to include nasal release. So, for example, if a child uses an alveolar active nasal fricative for /s/, this would be given a score of -2, reflecting the error in manner and oral release with place and voice correct. If following therapy, the active nasal fricative is realized as (t, d), the score would improve to -1, reflecting an error in manner only. Thus, the probe score would capture how the child changes his speech in therapy, providing much needed insight about this process.
Despite previous literature on adaptations of PCC, the PCC score for the present study was only adjusted to include audible nasal emission/turbulence, weak or nasalized consonants and dentalization as correct. This is in keeping with other cleft intervention studies, where percent passive CSCs have been categorized as correct [51, 52]. Although Scherer et al. [31] used the PCC-R due to the young age of the children, no adjustments were made for age-appropriate articulatory and phonological simplification processes in our study. It was considered that including other adaptations of PCC (e.g., PCC-A, PCC for place, manner, oral, non-oral) would have been too time consuming for the listeners, due to the high number of speech samples that required phonetic transcription and rating in this randomized controlled trial. Furthermore, for the purposes of the research question this was not required. However, it would be interesting to explore the extent to which POCC, percent oral errors, and percent non-oral errors may reflect change in therapy and perhaps compare this approach with Probe data.
Limitations of Our Study
The speech sample was limited to words and sentence or phrase repetition, and not conversational speech as recommended by Shriberg and Kwiatkowski [25]. However, previous studies have adapted the speech sample to similar speech samples of word lists and sentence tasks and used the PCC reliably [30, 31]. Indeed, Klintö et al. [24] reported that the transcription of single words was the most reliable, with sentence repetition valid and reliable with similar articulation accuracy as in retelling and conversational speech. Therefore, this would appear to be some justification for the use of these speech samples. In the current study, the authors had to use word and sentence speech samples to enable repeated analyses of articulation on the same material at different time points. The transcription of the targets in word initial and word final position in the words and sentences/phrases, together with the stimulability task, formed the basis for planning the speech intervention program. The sample had to be time limited and efficient so that the task could be completed in one session, limiting the number of times targets could be sampled and the opportunity to detect inconsistency [53]. However, transcription of targets in both single words and sentences/phrases allowed for occurrence of the targets of two exemplars as recommended by Stoel-Gammon [54] albeit in different word positions in the two different speech samples.
In the intra-rater reliability study, the listeners rated 20 video edits a second time within a 4-week period. Although the order of the duplicated edits was re-randomized and controlled for its place in the series of video edits, it is possible that the listener recognized the child and recalled details of their speech.
Conclusion
This study has demonstrated the importance of ensuring that there is reliability of the number of targets elicited in a PCC calculation in cleft speech intervention studies. This parameter is usually not reported. Care should be taken in intervention studies where listeners are reliable, but consistently perceive or categorize in a different way, ensuring the distinction between glottal articulation and consonant deletion, and active nasal fricatives and accompanying nasal emission/turbulence is made. The modified PCC score was reliable in this study and further work is required to examine how best to report the resolution of articulation difficulties in cleft intervention studies.
Acknowledgements
We wish to acknowledge Myra O’Regan, Professor, School of Computer Science and Statistics, Trinity College, Dublin for advice and statistical data analysis; Liane Deasy (RIP) PLAT Research SLT, Dublin, for her work in the preparation of the data for this reliability study and Katie Powell, PLAT Research SLT, London.
Research at Great Ormond Street Hospital NHS Foundation Trust is supported by the NIHR. The views expressed are those of the authors and not necessarily those of the NHS, the NIHR or the Department of Health.
Statement of Ethics
All parents of the children whose data were used in this study gave their written informed consent. The study protocol was approved by the Great Ormond Street Hospital/UCL ICH Joint Research and Development Office. Ethical approval reference 15/LO/1909. In Dublin, ethical approval was received from Our Lady’s Hospital for Sick Children (Gen/446/15), Temple Street Children’s University Hospital (15.054), and Trinity College Dublin, (Parent Led Articulation Therapy, Sweeney 15/16).
Disclosure Statement
The authors have no conflicts of interest to declare.
Funding Sources
We wish to thank the funders of the randomized controlled trial including the Cleft Lip and Palate Association Ireland, National Children’s Research Centre, Temple Street Foundation, and Cleft – Bridging the Gap. Data from the trial were used in the preparation of the manuscript.
Author Contributions
Each author made equal contributions to the conception and design of the work, the acquisition, analysis and interpretation of data and the drafting and revising of this manuscript.