Abstract
Objective: To examine English-speaking speech-language pathologists’ (SLPs) transcription of consonants in Vietnamese words and identification of correct/incorrect productions of Vietnamese children’s speech. Participants and Methods: Twenty English-speaking SLPs completed three tasks. Task 1: transcription of 22 English words using the International Phonetic Alphabet. Task 2: transcription of 47 words spoken by Vietnamese adults. Task 3: transcription of 94 Vietnamese words spoken by Vietnamese children and identification of correct/incorrect productions. Participants completed questionnaires exploring language proficiency, transcription skill, musicality and confidence with multilingual clients. Results: Task 1: participants demonstrated good accuracy transcribing English words (M = 97.2%). Task 2: an average of 52.9% consonants were transcribed correctly (89.4% when Vietnamese-English common transcription errors were considered). Common transcription errors included voicing of plosives, place and syllable-final omission. Accuracy was higher on shared English and Vietnamese consonantal articulations (e.g., /b/ and /m/). Task 3: on average, SLPs correctly identified accuracy of 73.8% of Vietnamese children’s productions and transcribed 69.2% consonants correctly (83.8% when Vietnamese-English common transcription errors were considered). Musicality was correlated with SLPs’ accuracy of transcription. Conclusion: English-speaking SLPs have some skills transcribing Vietnamese adults and transcribing and identifying correct/incorrect productions of children’s speech. SLPs may use knowledge of common transcription errors to support understanding of their transcription of speech.
Introduction
The diversity of the world’s population and migration rates across and between different continents have resulted in multilingual societies all over the world. For example, in English-dominated countries, linguistic diversity accounts for a sizable portion of the population. In Australia, 22% of households speak a language other than English at home [1]; in the United States, 21% of households speak a language other than English [2]; in Canada, 22% of resident Canadians report speaking a language other than French or English [3], and in the United Kingdom, 21% of children in primary school reportedly speak a language other than English [4]. The increased cultural and linguistic diversity of communities around the world can impact an individual’s access to education and employment if they do not speak the dominant language of the country in which they live [5]. In terms of health and education, speech-language pathologists (SLPs) are often the first contact if individuals have communication or swallowing difficulties, regardless of the language that person speaks. Linguistic diversity of communication professionals, and specifically SLPs, is often not reflective of the population which they service. In Australia, 20.2% of 2,849 paediatric services offered by Speech Pathology Australia members were available in a language other than English [6], but in the United States, only 6.6% of members of the American Speech-Language-Hearing Association meet the criteria to be considered a “bilingual service provider” [7]. Unfortunately, in many cases, the languages that SLPs speak and use professionally do not align with the distribution of communities speaking those languages [6]. As a result of the linguistic diversity in the wider population, and distribution of multilingual services, many SLPs may be providing services to children and adults who do not share a language with them. To date, there has been some promising research to demonstrate that SLPs are able to identify stuttering in languages that they do not speak [8, 9], but investigation of assessment accuracy for other diagnoses is underexplored.
One of the largest populations of children seen by paediatric SLPs are children with speech sound disorders [10, 11], that is, children who “have any combination of difficulties with perception, articulation/motor production, and/or phonological representation of speech segments (consonants and vowels), phonotactics (syllable and word shapes), and prosody (lexical and grammatical tones, rhythm, stress, and intonation) that may impact speech intelligibility and acceptability” [12, IEPMCS, p 1]. When providing assessments for children suspected of speech sound disorder, SLPs are required to provide an adequate investigation of children’s speech. The assessment of children’s speech production requires the accurate transcription of children’s speech output, typically using the International Phonetic Alphabet (IPA), as well as clinical judgements about the correct/incorrect productions of children’s speech when compared to the adult target [13, 14]. The use of the IPA to adequately capture children’s speech production is an essential component of any assessment of speech [15]. Transcription skills are routinely taught to SLPs around the world and frequently utilised in routine clinical practice [16]. Professional guidelines released by the International Expert Panel on Multilingual Children’s Speech [12, 13] recommend that children’s speech should be assessed in all of the languages the child speaks. The challenge for SLPs who do not share all of the child’s languages begins with the transcription of children’s speech, and judgements of correct/incorrect productions, in their second or non-shared language.
One study has previously investigated the support required for student SLPs to transcribe a language they do not speak [17]. In this study, 33 English-speaking student SLPs who had been trained in phonetic transcription and phonology and who did not speak Cantonese were asked to transcribe Cantonese single words of adult and child speakers. They were able to transcribe Cantonese adults’ consonants with 59.1% accuracy (72.9% when Cantonese – English transfer patterns were considered). Next the participants transcribed children’s consonants (comprising correct and error-based productions) in 4 conditions: (1) transcription from an audio-visual recording when provided with the target transcription, (2) transcription with the addition of an audio recording of an adult saying the target word, (3) transcription with the addition of information about the phonological system of Cantonese and (4) transcription with the addition of an audio recording of an adult and information about Cantonese phonology. Across each of the conditions, accuracy of transcription improved as more support was provided. Participants transcribed a mean of 58.3% consonants correctly (condition 1), 66.1% correct (condition 2), 66.1% correct (condition 3) and 71.0% correct (condition 4). This study also highlighted that English-speaking student SLPs were more accurate at identifying correct/incorrect productions, rather than transcribing the exact consonant errors produced by Cantonese-speaking children. The student SLPs correctly identified correct/incorrect productions of consonants with a mean of 63.8% (condition 1), 72.6% (condition 2), 69.2% (condition 3) and 73.0% (condition 4). While transcribing a non-familiar language is challenging, transcription and judgements of correct/incorrect productions may be improved with video support and information about the phonology of the target language [17]. Further exploration of SLPs’ skills transcribing other languages is indicated.
Vietnamese migration has occurred around the world for many years. Consequently, Vietnamese is one of the most commonly spoken languages other than English in many English-speaking countries including Australia [18], US [2] and Canada [3]. As with Cantonese, the language utilised for the Lockart and McLeod [17] study, Vietnamese is dissimilar to English in a number of ways. English and Vietnamese differ in terms of (1) language timing where English is a stress-timed language and Vietnamese is a syllable-timed language; (2) consonant inventories where Vietnamese and English share 16 consonantal articulations, but each includes phonemes that are not present in the other (i.e., Vietnamese includes /x/ and /ɣ/, whereas English includes /ɡ/, /ʃ/ and /ʒ/); (3) tonal differences where Vietnamese is a tonal language in which tones act as distinctive features to mark phonological contrast and English is not a tonal language; (4) common word length where Vietnamese is a primarily monosyllabic language but almost 30% of English words are polysyllabic (including >3 syllables); (5) phonotactic complexity where Vietnamese does not allow consonant clusters but English allows many 2-element and 3-element consonant clusters. The distance between the two languages can be calculated in a number of different ways depending on the linguistic features utilised to calculate the distance. By way of example, English and German languages have a relatively small linguistic distance between them as both originate from a Germanic language heritage. In contrast, the linguistic distance between English and Vietnamese is much larger. There are a number of metrics that may be used to measure linguistic distance; for example, phonological distance [cf. 19] or complexity of adult language learning [20]. Across each of these metrics, Vietnamese and English are largely dissimilar. One of the possible influences on an SLP’s ability to transcribe a language they do not speak includes their knowledge of how speech is perceived in languages they do speak (i.e., anticipating the effect of coarticulation between adjacent sounds). Investigation of SLPs’ skills transcribing a language that has a large linguistic distance from their own will inform understanding of the possible impact of language knowledge on SLPs’ transcriptions.
Research Questions
1How accurately can English-speaking SLPs:
(a) transcribe the consonants produced by Vietnamese-speaking adults?
(b) transcribe the consonants produced by Vietnamese-speaking children with typical and atypical speech sound development?
(c) identify when Vietnamese-speaking children’s productions are correct/incorrect when compared to a Vietnamese adult target?
2What are common transcription errors that occur when English-speaking SLPs transcribe the speech of Vietnamese-speaking children and adults and do these errors follow a predictable pattern?
3Is the accuracy of transcription influenced by participants’ accuracy of transcription in English and other participant characteristics (e.g., self-reported musicality or second language use)?
Method
Participants
The participants were 20 SLPs recruited from Sydney and Brisbane, Australia. All of the participants were female, completed their speech-language pathology training at an Australian university and indicated that they did not have any hearing difficulties. Participants worked across a variety of work settings including community health, hospitals, education, private practice, disability services, non-government organisations and out of home care. No participants worked in a university. Most participants (n = 16, 80.0%) worked full-time and all worked with children. Two participants (10.0%) indicated that they worked with both children and adults. Participants indicated different areas of expertise across the range of diagnoses that may be seen by an SLP. They had expertise in working with children with speech sound disorder (n = 15, 75.0%), childhood apraxia of speech (n = 4, 20.0%) and developmental language disorder (n = 15, 75%). No participants indicated that they had expertise working with children who had dysarthria.
All of the participants spoke English; most were monolingual (n = 15) and the remainder spoke one (n = 3) or two (n = 2) additional languages that included Portuguese, Spanish, Italian, Persian, Hindi, French and Tetum. Most participants indicated that they had a considerable proportion of their caseload from culturally and linguistically diverse backgrounds: 75–100% (n = 2, 10.0%), 50–75% (n = 6, 30.0%), 25–50% (n = 7, 35.0%), 10–25% (n = 4, 20.0%) and <10% (n = 1, 5.0%). Additional languages spoken by children on participants’ caseloads included Aboriginal English, Arabic, Assyrian, Australian Indigenous languages (not specified), Bengali, Cantonese, Gujarati, Hindi, Konkani, Mandarin, Marathi, Russian, Serbian, Spanish, Urdu and Vietnamese. Thirteen (65.0%) participants indicated that they did not think their university training prepared them for working with families from culturally and linguistically diverse backgrounds, 6 (30.0%) were unsure and only one (5.0%) participant indicated that their university training prepared them sufficiently.
Procedure
The study protocol was approved by the Charles Sturt University Human Research Ethics Committee (approval number 2011/172), and all participants provided informed consent for participation in this study.
Demographic Questionnaire
All participants completed a demographic questionnaire before completing any experimental tasks and a feedback questionnaire after completing the experimental tasks. The demographic questionnaire allowed participants to document their years of clinical experience, transcription training and skill, languages used, musicality, caseload and opinions on the value of SLPs being fluent in more than one language and using languages in which they are not fluent for assessment and intervention. The demographic questionnaire contained 30 items with a combination of open and closed questions and 10-point Likert scales and was similar to the questionnaire used by Lockart and McLeod [17] to allow comparison between the earlier study and the current investigation.
Following the completion of the demographic questionnaire, all participants completed three experimental tasks designed to explore their accuracy of consonant transcription in English (Task 1) and Vietnamese (Tasks 2 and 3). All participants completed Task 1, Task 2 and Task 3 in order. Participants were randomly allocated to one of two stimuli presentation conditions for Task 3: (1) Task 3a followed by Task 3b or (2) Task 3b followed by Task 3a. All tasks were completed in a quiet location, and participants were able to listen to the stimuli for each task at their own pace. For all tasks, the participants were given a consonant chart of IPA symbols used for Australian English and Standard Vietnamese (adapted from [21]). The third author scored all participant transcriptions against the transcriptions made by an Australian English-speaking expert (Task 1) and a Vietnamese-speaking expert (Task 2 and Task 3).
Task 1. English Transcription (Typical Adult Production)
Task 1 provided a baseline of the participants’ English transcription skills. Participants were asked to transcribe the consonants within 22 English words produced by 1 adult female speaker of Australian English. Stimuli in Task 1 sampled all Australian English consonants across word positions and were presented as an audio file within a slide presentation. Included in the PowerPoint slide for each stimulus were the item number and English orthography. A total of 44 consonants were sampled. No English transcription targets were provided.
Task 2. Vietnamese Transcription (Typical Adult Production)
Task 2 was an experimental task designed to determine the participants’ skill transcribing typical adult Vietnamese speech. Participants were asked to transcribe the consonants and semivowels (i.e., /w/ and /j/) of 48 Vietnamese words from the Vietnamese Speech Assessment (VSA) [22] that were produced by one of two adult Vietnamese speakers (1 male, 1 female) who each used a Northern Vietnamese dialect. Intra- and inter-judge reliability for transcription of the VSA has been established above 95% [23]. The 48 Vietnamese words used in Task 2 were chosen so that each Vietnamese syllable-initial and syllable-final consonant, medial and syllable-final semivowel, vowel, diphthong and tone as described in Phạm and McLeod [21] was represented at least twice. One word, pate /pɑte/, was included in Task 2 but was removed from analysis because it was a loan-word from English and was the only bi-syllabic word. Thus, the total number of target words analysed for each participant was 47.
Participants were provided with a Vietnamese IPA chart for consonants and were not required to transcribe vowels or tones. Stimuli in Task 2 were presented as a video file of the speakers’ face within a PowerPoint presentation. Included on the PowerPoint slide for each stimulus were the item number, Vietnamese orthography and English translation. No Vietnamese transcription targets were provided.
Task 3. Vietnamese Transcription (Typical and Atypical Children’s Productions)
Task 3 was an experimental task designed to explore the accuracy of participants’ transcription of typical and atypical Vietnamese children’s speech and had two similar versions profiling different children (Task 3a and 3b). In each version of Task 3, participants were required to write two responses to each audio-visual stimulus: (1) identify whether the children’s production was correct or incorrect compared to the adult target and (2) transcribe children’s production using the IPA. Participants were provided with a video file of an adult speaker producing the target word and a video file of the child producing the same target word. Both video files were presented on the same PowerPoint slide alongside the item number, Vietnamese orthography and English translation. Vietnamese adult targets were provided using IPA transcription. One slide was used for each stimulus. Participants were able to watch either video file as many times as required to complete the task.
The productions of five Vietnamese children (aged [years;months] 3;6–5;11) from Northern Vietnam were used as stimuli in Task 3. Participants were asked to transcribe the consonants and semivowels of Vietnamese children’s productions of 48 words from the VSA [22] on 2 occasions; Task 3a presented 48 words of the VSA produced by 2 Vietnamese children (aged 3;9 and 5;11). Fifteen productions in Task 3a did not match the adult target. Task 3b presented 48 words of the VSA produced by 3 Vietnamese children (aged 3;6, 3;7 and 5;6). Sixteen productions in Task 3b did not match the adult target. As in Task 2, participants were provided with a Vietnamese IPA chart for consonants and not required to transcribe vowels or tones. One word (pate/pɑte/) was removed from analysis. Thus, the total number of target words analysed for each participant was 94.
Feedback Questionnaire
The feedback questionnaire allowed participants to document the difficulty of each task, identify which components made the tasks harder or easier and provide comments about their experiences with the transcription and identification of speech sound errors in a language other than English. The feedback questionnaire contained 5 open-ended questions and 5 questions that could be answered using a 10-point Likert scale.
Reliability
Inter-judge and intra-judge reliability of participants’ transcription accuracy were calculated for Tasks 1, 2 and 3. Initial coding of participants’ correct/incorrect consonant transcriptions for Tasks 2 and 3 was determined by the third author (e.g., /t/ correctly transcribed as /t/ was coded as 1; /t/ incorrectly transcribed as /p/ coded as 0). Two participants’ transcriptions were re-coded by the first author to determine inter-judge reliability of transcription accuracy coding; the third author also re-coded two participants’ transcriptions to determine intra-judge reliability. Inter-judge reliability was performed on 594 data points and was 97.5%; intra-judge reliability was 95.1%.
Analysis
Target transcriptions for all experimental tasks were established by a native speaker of each dialect and language who was trained in phonetic transcription. Participants’ transcriptions for each item across each of the three tasks were compared to the target transcriptions and coded for accuracy of each consonant transcribed and whole word consonant accuracy (i.e., all consonants in the word transcribed correctly). Participant accuracy of each syllable-initial consonant, syllable-final consonant and medial and syllable-final semivowel was calculated by comparing participants’ transcriptions to the target transcriptions. All participants’ responses, ratings of the accuracy of transcription and whole word consonant accuracy were entered into IBM SPSS Statistics 24 [24].
Descriptive statistics were extracted to describe the accuracy of participants’ transcriptions when compared to the target. The relationship between participants’ Vietnamese transcription accuracy (Tasks 2 and 3), percentage of correct speech accuracy judgements (Task 3) and characteristics (based on responses to the demographic questionnaire) were investigated individually using Pearson correlation.
Results
Task 1. English Transcription (Typical Adult Production)
The participants’ accuracy in transcribing 44 Australian English consonants in 22 words was calculated. Of the 44 consonants sampled, participants transcribed an average of 97.2% correctly (SD 1.48, range 88.6–100%). Errors typically occurred on /ɹ/ (replaced with /r/) and with voicing (e.g., /θ/ was transcribed as /ð/). Participants correctly transcribed most consonantal articulations shared in English and Vietnamese.
Task 2. Vietnamese Transcription (Typical Adult Production)
A total of 47 transcribed words were analysed for each participant (i.e., total of 940 words transcribed) including 85 consonants and semivowels transcribed by each participant (n = 1,700 consonants and semivowels transcribed). Initial, medial and final consonants and semivowels of Vietnamese are indicated in Appendix A. Of the 85 consonants sampled, participants correctly transcribed a mean of 52.9% (SD 6.75, range 40.0–64.7%) consonants spoken by Vietnamese adults in Task 2 (Table 1). Accuracy was higher on shared English and Vietnamese consonantal articulations (e.g., /p, b, k, m, f, v, z/). The errors in transcription for each consonant and semivowel were also calculated across each syllable position (Appendix B). Of the 47 consonants sampled in syllable-initial position, mean accuracy was 64.6% (SD 9.34, range 48.9–78.7%); of the 2 consonants sampled in medial position, mean accuracy was 67.5% (SD 46.7, range 0–100); of the 36 consonants sampled in syllable-final position, mean accuracy was 36.9% (SD 7.71, range 25.0–50.0%).
Common Vietnamese-English Transcription Errors Demonstrated by Participants
Common Vietnamese-English transcription errors were identified as frequently shared transcription errors and patterns (Appendix A). Three common transcription error patterns were described: (1) plosive voicing errors when place of articulation matched the target consonant (e.g., /p/ transcribed as [b]); (2) place of articulation errors where manner and voicing matched the target consonant and articulation only differed by one position (e.g., /c/ transcribed as [k]). This pattern did not emerge, however, in all cases; and (3) syllable-final consonant errors where a syllable-final consonant was not transcribed by participants even though it was produced by the Vietnamese speakers (e.g., /t/ transcribed as no consonant Ø). Non-transcription of syllable-final plosives was particularly common. The frequency of these common Vietnamese-English transcription errors is presented in Appendix B.
Of the errors, a total of 77.5% were accounted for by common transcription error patterns demonstrated across participants; 57.4% of syllable-initial consonant errors, 100% of medial semivowel errors and 91.6% of syllable-final consonant errors were attributed to common transcription error patterns (Table 2). Therefore, if these common errors were accounted for as acceptable transcriptions (in combination with consonants transcribed correctly), SLPs’ transcription accuracy increased to 89.4; 84.9% of syllable-initial consonants, 100% of medial semivowels and 94.7% of syllable-final consonants.
Task 3. Vietnamese Transcription (Typical and Atypical Children’s Productions)
All participants commenced Task 3. All participants completed Task 3a (47 words, 85 consonants) and 19 completed Task 3b (47 words, 85 consonants). One participant was unable to complete Task 3b due to work/time commitments. A total of 1,880 words were included in the analysis. Participants’ judgement of children’s correct/incorrect productions was more accurate when children produced words correctly (n = 64; M = 75.9%; range 37.5–90.63%) than incorrectly (n = 30; M = 66.5%; range 10.0–86.7%). The total judgement accuracy accounting for both words produced correctly and those produced incorrectly was 73.8% (range 28.7–87.2%).
Transcription accuracy by word position for Task 3 is reported in Table 1. The participants correctly transcribed an average of 69.2% (range 48.2–83.5%) consonants spoken by Vietnamese children (Task 3). As with the adult transcriptions completed in Task 2, participants were more accurate transcribing shared English-Vietnamese consonantal articulations (Appendix C). The mean accuracy of syllable-initial transcription was 63.3%, medial semivowel transcription accuracy was 91.6% and syllable-final transcription accuracy was 75.8%. When accounting for the Vietnamese-English common transcription errors identified in the adult transcriptions (Task 2), 52.2% of errors were accounted for by common error patterns that were consistent with those identified in Task 2 (Table 2); 47.8% were not accounted for by these error patterns. Therefore, if these common transcription errors were accounted for as acceptable transcriptions (in combination with consonants transcribed correctly), SLPs’ transcription accuracy increased to 83.8; 77.9% of syllable-initial consonants, 97.5% of medial semivowels and 90.8% of syllable-final consonants. Error patterns were more consistent with those identified in Task 2 when the child matched the adult target (but the consonant was transcribed incorrectly) when compared with consonants that did not match the adult target; 65.4% of errors were accounted for by common errors when the child matched the adult target; 37.9% were due to other errors, not routinely identified in transcriptions of the adult productions (see Table 2 for details of transcription errors across each word position).
Transcription Skill
Participants’ Self-Reported Transcription Skill
Twelve (60.0%) participants indicated that they had received training in broad transcription of English, six (30.0%) had received training in broad and narrow transcription of English and two (10.0%) had not completed any transcription training in English. Only one (5.0%) participant indicated that they had received transcription training in a language other than English. Most participants used broad transcription frequently with five (25.0%) using broad transcription every day, nine (45.0%) using it few times a week and one (5.0%) using it once a week; five participants (25.0%) used broad transcription less frequently with three (15.0%) only using broad transcription once a month and two (10.0%) using it rarely. Narrow transcription was rarely used across the participants with most participants indicating that they rarely (n = 8, 40.0%) or never (n = 8, 40.0%) used it, two (10.0%) indicating they used it once a month and only two (10.0%) indicating that they used it few times a week.
Association between Participant Characteristics and Transcription Accuracy
Pearson correlations were conducted to investigate the relationship between Vietnamese transcription accuracy (based on Task 2), accuracy of correct/incorrect Vietnamese speech judgement (based on Task 3) and 5 participant characteristics (language use [mono- or multilingual], years of experience and self-reported measures of musicality, IPA transcription skill and confidence working with culturally and linguistically diverse families).
Fifteen (75.0%) participants indicated that they were monolingual English speakers, three (15.0%) indicated that they spoke English and another language and two (10.0%) indicated that they spoke English and two additional languages. The mean number of years working across participants was 7.9 years (SD 7.35, range 1–25 years). Self-reported measures were gathered on a 10-point scale (where 1 = not musical/skilled/confident and 10 = highly musical/skilled/confident). Participants rated themselves with a mean musicality of 4.65 (SD 2.91), transcription skill of 7.10 (SD 1.21) and confidence working with culturally and linguistically diverse families of 6.05 (SD 1.70).
The only significant correlation identified was that participants who self-rated as more musical scored higher in their Task 2 consonant transcription accuracy (r = 0.46, p = 0.04) than participants with lower self-rated musicality scores. Such a correlation indicates a medium effect based on the guidelines for the effect size of correlation that a correlation of r = 0.1–29 is weak, r = 0.3–0.49 is moderate and r = 0.5–1.0 is large [25]. No other correlations were statistically significant. Despite this, several correlations showed a small to medium effect size. For example, participants who spoke language/s other than English scored higher in all 3 outcome variables (Task 2 and Task 3 transcription, and Task 3 whole word correct/incorrect judgements with a medium effect size [r = 0.35, p = 0.13; r = 0.36, p = 0.13; r = 0.35, p = 0.13]). Furthermore, participants who had higher confidence in IPA transcription scored higher in their Task 2 consonant transcription accuracy with a medium effect size (r = 0.32, p = 0.17). Interestingly, SLPs with more years of working experience scored lower on the whole word correct/incorrect judgement in Task 3 with a medium effect size (r = –0.40, p = 0.08).
Discussion
The results of this research indicate that the SLPs who participated in this study were mostly accurate completing the English phonemic transcription task but were less accurate transcribing a language that they do not speak, in this case, Vietnamese. Common transcription error patterns were identified to account for many of the differences between transcriptions made by the English-speaking SLPs and the target Vietnamese transcription. SLPs were more accurate identifying whether children’s productions were correct/incorrect compared to an adult target than transcribing the consonants. One factor was correlated to SLPs’ accuracy of transcription of Vietnamese-speaking adults – self-reported musicality; however, the small sample size made the consideration of associated skills and experiences (e.g., musicality and years of experience) challenging. The results of this research will be discussed according to each of the three research aims.
Accuracy of English-Speaking SLPs’ Transcription of Vietnamese
The transcription of the speech sounds of an unfamiliar language is challenging, even for SLPs who are trained in phonetic transcription using the IPA. The average SLPs’ accuracy of transcription was 97.2% when transcribing English (Task 1). In Task 2, these non-Vietnamese-speaking SLPs transcribed 52.9% Vietnamese consonants correctly spoken by an adult. A large number (77.5%) of incorrectly transcribed consonants were accounted for by common transcription error patterns across all participants. “Acceptable” transcriptions were calculated based on the frequency of Vietnamese-English common transcription errors, as well as correct transcription. When these Vietnamese-English common transcription errors were considered “acceptable” for the purposes of interpreting speech, SLPs transcribed a total of 89.4% adequately. These results are comparable to the findings of Lockart and McLeod [17] who demonstrated that non-Cantonese-speaking SLPs transcribed 59.1% Cantonese consonants correctly spoken by an adult (72.9% when Cantonese-English transfer patterns were allowed). In Task 3, on average SLPs correctly transcribed 69.2% consonants correctly (83.8% when Vietnamese-English common transcription errors were considered). That is, 52.2% of incorrectly transcribed consonants were accounted for by common transcription error patterns across all participants. Three findings of this research demonstrate positive advances in our knowledge about SLPs’ skills in this area: (1) SLPs transcribe shared Vietnamese-English consonantal articulations more accurately than non-shared ones, (2) SLPs demonstrate moderate accuracy identifying children’s correct/incorrect productions and (3) SLPs demonstrate Vietnamese – English common transcription error patterns (Appendix A).
SLPs who do not speak Vietnamese transcribed shared Vietnamese-English consonantal articulations more accurately than non-shared ones. Similar to the findings of Lockart and McLeod [17] with Cantonese, the participants in this study were more accurate transcribing the shared consonantal articulations of /b/ (97.5% accurate), /d/ (80.0%), /m/ (97.5%), /f/ (100%), /v/ (95.0%), /z/ (82.5%), /h/ (100%) and /l/ (100%) compared to the non-shared consonantal articulations of /ts/ (32.5%), /ɣ/ (32.5%), /x/ (50.0%), /ɲ/ (52.5%) and /ʈ/ (20.0%; Appendix B).
The non-Vietnamese-speaking SLP participants demonstrated good accuracy in identifying children’s correct/incorrect productions when provided with an adult model of the target word. In Task 3, the participants demonstrated an average of 73.8% accuracy identifying whether children’s productions were correct or incorrect and were more accurate identifying correct productions than incorrect productions. The participants correctly transcribed 69.2% of Vietnamese consonants spoken by children and 52.2% of transcription errors were consistent with error patterns identified during the transcription of adult speech in Task 2. When common transcription errors were considered, SLPs transcription of children’s speech was acceptable to 83.8%. Similarly, Lockart and McLeod [17] found participants were more able to identify the accuracy of a production (with the support of an adult target and IPA target), as opposed to transcribing the phonemes correctly. This finding suggests that English-speaking SLPs may be able to use adult productions of assessment words (i.e., produced by an interpreter, family member or audio-visually) to provide a model of the target production and identify whether children’s production matches that production (or not). Transcription of children’s productions may inform further understanding of children’s phonology. Further understanding of one’s own transcription may be informed by the second outcome of this research, the identification of common transcription errors.
Vietnamese-English Common Transcription Errors
English-speaking SLPs demonstrated reasonably consistent patterns of errors when transcribing Vietnamese speech, suggesting that SLPs can identify challenging consonants to transcribe in Vietnamese and the likely errors in their own transcription (Appendix A). Three common transcription errors were identified in the transcriptions of English-speaking SLPs transcriptions of Vietnamese. Voicing errors when transcribing plosives were particularly prevalent. The SLPs in this study frequently transcribed the correct place and manner of articulation of the plosive consonant, but transcribed the voiced pair of the target voiceless consonant. For example, the most common transcription errors were [b] for /p/ and [d] for /t/. The work of Munson et al. [26] regarding voice onset time across languages may go some way to explaining the challenges non-Vietnamese-speaking SLPs face when attempting to perceive Vietnamese plosives, since voice onset time delineation between plosives differs between speakers of different languages.
The second common transcription error occurred when SLPs correctly transcribed the manner and voicing of the target consonant but the place of articulation differed by one articulatory position. For example, the most common error for /s/ was the transcription of [ʂ]. Likewise, the most common error for /ʂ/ was the transcription of [s]. These errors may be due to unfamiliarity with non-English consonant symbols or coarticulatory factors with the vowels within the words that may influence perception of phoneme articulatory position. For example, one of the two /s/ target words was /sɯɤŋ/ (bone) where the diphthong is a back close-mid-close rounded vowel. Likewise, one of the target words for /ʂ/ was /ʂɛn/ (lotus flower) where the vowel and syllable-final consonant are in forward articulatory positions. Thus, the context of consonants when sampling children’s speech may influence the perception of each consonant. It is known that anticipation of coarticulatory effects influence speech perception of known language/s. The same compensation for articulation may not occur when perceiving a non-familiar language [27]. Indeed, it may be that listeners perceive the segments within non-familiar languages by referring to their own language-specific knowledge [28]. Thus, when transcribing unfamiliar languages, listeners are working against a bias of knowledge of their own language/s that may influence their ability to segment and anticipate coarticulatory effects.
Syllable-final consonant errors were most commonly due to SLPs not transcribing a syllable-final consonant when there was, in fact, a consonant present in the target. This occurred for both adult and child speakers of the target words (even when child productions were correct and target transcriptions were provided). The presence of this error may be due to the high frequency of unreleased syllable-final consonants in the Northern Vietnamese dialect. Previous research has highlighted the influence of known language factors that can influence the perception of word-final stops [29]. Thus, it is not surprising that SLPs will find phonemic transcription of an unfamiliar language challenging. Additionally, greater accuracy in coding consonants in the syllable-initial position was likely due to the increased saliency compared with syllable-final consonants (i.e., syllable-initial consonants provide more contrasts and are less likely to be omitted) [30-32].
Over three quarters of the transcription errors demonstrated by the SLP participants when transcribing the speech of Vietnamese adults were accounted for by these three common transcription error patterns. However, half the transcription errors were not accounted for by these error patterns when transcribing children’s speech. When transcription errors are considered alongside children’s correct/incorrect productions (i.e., whether it matched or did not match the adult target), the lack of consistency when transcribing children’s speech becomes apparent. There was a clear effect of correct/incorrect productions of children’s speech on the predictability of SLPs transcription errors. If children’s production matched the target adult production, and the transcription was incorrect, 65.4% of errors followed the patterns identified in the adult transcription task (Table 2). Inversely, if children’s productions did not match the target adult production, only 37.9% of errors followed the patterns identified. Transcription of an unfamiliar language may improve when SLPs are able to evaluate their own performance against common transcription errors.
Other Factors That May Influence Transcription Accuracy
The final aim of the current research was to determine whether any SLP factors impacted participants’ transcription accuracy. The one factor that did appear to influence the accuracy of SLPs transcription was their self-reported ratings of musicality. This result was in contrast to the findings of Lockart and McLeod [17] who did not find a relationship between transcription accuracy and self-reported musicality but was supportive of previous research suggesting that musicality may assist with the perception of a new language during language learning [33, 34]. Due to the small sample size of the group, other correlations did not appear to have sufficient power to achieve statistical significance.
Clinical Implications
This research provides guidance for SLPs who do not speak Vietnamese when assessing Vietnamese speakers. English-speaking SLPs are likely to be more accurate transcribing shared than non-shared Vietnamese-English consonantal articulations. For example, the participant SLPs were consistently accurate (>95%) in their transcription of Vietnamese adult speakers’ productions of /b, m, f, v, h, l/ in the syllable-initial position and /m/ in the syllable-final position (Appendix B). English-speaking SLPs are likely to have some skill identifying whether Vietnamese children’s productions were correct/incorrect compared with an adult target and demonstrate Vietnamese-English common transcription error patterns. Consideration of Vietnamese-English common transcription errors should be used to caution English-speaking SLPs from making diagnostic decisions about voicing errors or final consonant deletion for Vietnamese children without consulting a Vietnamese speaker, preferably with linguistic or SLP training. In contrast, English-speaking SLPs may be more confident in identifying whether Vietnamese speakers have difficulty with nasality or stopping of fricatives.
Limitations
The primary limitation of this study was the small sample size and similar geographical area for most participants. The lack of statistical significance may be linked to the small sample size (n = 20), yielding low power to detect statistical significance. A power analysis (2-tailed, α = 0.05, β = 0.80) indicated that a minimum sample size of n = 26, n = 82 and n = 779 was needed to detect a large, medium and small effect. Although the results did, in many ways, replicate a previous study with similar research questions [17], further replication is required to determine whether the findings of this research may be generalised to a larger population of SLPs.
Conclusion
The transcription of a non-familiar language is complex, and accurate transcription is challenging, particularly without training. However, two factors may increase the accuracy of SLPs’ transcription of a language they do not speak: (1) judging whether the production sounded similar to or different from an adult production of the target word and (2) acknowledging common transcription errors demonstrated by SLPs who do not speak the language. Accounting for these two factors may support SLPs to interpret their perception and transcription of speech, even if they do not speak the language.
Acknowledgements
The authors acknowledge support from Dr. Cen (Audrey) Wang, a Charles Sturt University Faculty of Arts and Education research assistant grant and an Australian Research Council Discovery Grant (DP180102848).
Disclosure Statement
The authors have no conflicts of interest to declare.
Author Contributions
S.M.: contributed to this work by designing the data collection protocol and analysis, collecting data, completing data analysis and drafting the manuscript. S.Mc.: contributed to this work by research conception, design of data collection protocol and analysis, supporting data analysis, and drafting/editing the manuscript. A.C.: contributed to this work by designing the data collection protocol and analysis, collecting data, completing data entry and analysis and drafting/editing the manuscript. B.P.: contributed to this work by designing the data collection protocol and analysis, supporting analysis of Vietnamese transcription and drafting the manuscript.