The Advanced Bionics® (AB)-York crescent of sound is a new test setup that comprises speech intelligibility in noise and localization tests that represent everyday listening situations. One of its tests is the Sentence Test with Adaptive Randomized Roving levels (STARR) with sentences and noise both presented from straight ahead. For the Dutch population, we adopted the AB-York setup and replaced the English sentences with a validated set of Dutch sentences. The Dutch version of the STARR is called the Utrecht-STARR (U-STARR). This study primarily assesses the validity and reliability of the U-STARR compared to the Plomp test, which is the current Dutch gold standard for speech-in-noise testing. The outcome of both tests is a speech reception threshold in noise (SRTn). Secondary outcomes are the SRTn measured with sounds from spatially separated sources (SISSS) as well as sound localization capability. We tested 29 normal-hearing adults and 18 postlingually deafened adult patients with unilateral cochlear implants (CI). This study shows that the U-STARR is adequate and reliable and seems better suited for severely hearing-impaired persons than the conventional Plomp test. Further, CI patients have poor spatial listening skills, as demonstrated with the AB-York test.
Cochlear implantation is a successful way to restore auditory communication in severely hearing-impaired persons. Although cochlear implant (CI) patients generally hear well in a quiet setting, hearing with background noise, as is normal in daily practice, remains challenging [Crathorne et al., 2012; Gaylor et al., 2013; van Schoonhoven et al., 2013]. The evaluation of spatial hearing and hearing-in-noise capabilities becomes increasingly important in this era of improving sound processing strategies, implantation techniques, and a growing interest in bilateral implantation. Traditional speech tests comprise words or sentences presented at fixed levels, and cochlear implantees are often allowed to adjust their processor volumes. These tests are not representative of everyday listening situations, in which levels of speech and background noise change constantly.
In 1979, Plomp and Mimpen developed a Dutch hearing-in-noise test for people with difficulties understanding speech in background noise but with relatively good pure tone audiometry (PTA) thresholds. In this test, sentences are presented at a level at which a person can understand the words in silence, after which noise is added in an adaptive manner. A sentence is scored as correct when repeated 100% correctly. The outcome is a speech reception threshold in noise (SRTn), defined as the signal-to-noise ratio (SNR) at which the person is able to repeat 50% of the sentences correctly [Plomp and Mimpen, 1979].
Although this test is useful for people with relatively good hearing, it is too difficult for CI patients [Van Wieringen and Wouters, 2008]. Even in silence, it is difficult for CI patients to reproduce sentences 100% correctly, which would result in poor Plomp test results. These patients are, however, usually good at understanding speech by using the context of words.
Recently, the clinical research department of Advanced Bionics® developed the Sentence Test with Adaptive Randomized Roving levels (STARR). In this test, sentences are presented in noise, and the number of key words correctly repeated per sentence is scored instead of whole sentences correctly repeated [Joffo et al., 2010; Kitterick et al., 2011; Boyle et al., 2013]. This seems more suitable for CI patients than the original, difficult Plomp test. In the STARR, CI patients are allowed to make small mistakes while they can still show that they have understood the sentence.
In collaboration with Advanced Bionics, Prof. Q. Summerfield's research group in York developed a new test setup that enables the presentation of the STARR sentences in noise - at roving levels and from different directions [Kitterick et al., 2011].
We have adopted this Advanced Bionics (AB)-York crescent of sound test setup for the Dutch population and replaced the English STARR material with a validated set of Dutch sentences (the VU98 list of sentences [Versfeld et al., 2000], recorded by a female speaker). This new Dutch speech-in-noise test is called the Utrecht-STARR (U-STARR).
The main goals of this study were (1) to validate the U-STARR by measuring a group of normal-hearing persons, (2) to test the reliability of the U-STARR compared to the conventional Dutch Plomp test in normal-hearing persons and CI patients, and (3) to test our hypothesis that the U-STARR is better suited for CI patients than the Plomp test.
Secondary outcomes were speech intelligibility in noise with sounds coming from spatially separated sources (SISSS) as well as sound localization capabilities, both evaluated with this new setup.
Subjects and Methods
This cross-sectional study was conducted according to the principles expressed in the Declaration of Helsinki and was approved by the Human Ethics Committee of the University of Utrecht (NL2499001808).
Twenty-nine normal-hearing adults were recruited by means of advertisements posted at the otolaryngology outpatient clinic of the University Medical Center Utrecht, and 18 CI patients were selected through the hospital CI database. They all met the inclusion criteria outlined in table 1 and were enrolled in the study after they gave written informed consent. In order to get a homogenous group of CI patients, we selected participants in whom the auditory cortex had developed in early life (i.e. postlingually deafened). Since it is often difficult to accurately determine at which age a severe hearing loss started, we used the criterion of all participants having attended mainstream education. Even if the patients used hearing aids in class, their auditory cortex would have developed well enough to consider them postlingually deafened. Furthermore, in the Netherlands, it is very unlikely that a deaf or severely hearing-impaired child would be placed in mainstream education. All participants knew exactly which type of education they had followed. Further details on the participants are presented in table 2.
The Dutch AB-York Crescent of Sound Test Setup
Speech intelligibility in noise and sound localization tests were conducted in a soundproof room with 9 audiovisual stands in the frontal hemifield. Seven of these stands were positioned at 30-degree intervals, and 2 additional stands were positioned at 15-degree intervals on either side of 0°. The audiovisual stands were positioned in a crescent shape with a radius of 1.45 m and extended to a height of 1.1 m (fig. 1) [Kitterick et al., 2011]. The original AB-York test setup contains English sentences. We replaced these sentences by the Dutch VU98 sentences, a set of 39 lists, each comprising 13 sentences. This large and validated set, recorded by a female speaker, is not being used for other hearing evaluation purposes in our department. The sentences were therefore new to all patients [Versfeld et al., 2000].
Baseline Hearing Tests
The hearing of normal-hearing persons was tested with a standard PTA and a Dutch phoneme test (consonant-vowel-consonant or CVC test). In the CI group, the phoneme test was conducted in three listening conditions: monaurally, with either the CI or the hearing aid switched on, and bimodally, with both the CI and the hearing aid switched on.
Dutch AB-York Crescent of Sound
The test battery conducted with the Dutch AB-York crescent of sound consisted of a Plomp test, the U-STARR, an SISSS, and a sound localization test. In the Plomp test, sentences and noise were both presented from straight ahead. A sentence was scored as correctly repeated when all words were repeated correctly. The outcome was the SNR necessary to repeat 50% of the sentences correctly; this is the SRTn (in decibels) [Plomp and Mimpen, 1979]. In the U-STARR, sentences and noise were also presented from straight ahead, but the number of key words repeated correctly was scored instead of whole sentences.
Two researchers and a speech therapist independently selected the key words per sentence and debated on their differences to make a final selection. Five key words were selected in long sentences and 3 key words in shorter sentences. In the U-STARR, a sentence was scored as correctly repeated when a subject repeated at least 3 out of 5 or 2 out of 3 key words correctly. As in the Plomp test, the U-STARR result was the SRTn.
In both the Plomp test and the U-STARR, sentences were presented at 65, 70, or 75 dB SPL (randomly selected) with an initial SNR of +20 dB (sentence 20 dB louder than noise). The noise started 500 ms before and continued 500 ms after the sentence. The SNR was measured with an adaptive procedure: if a sentence was scored as correct, the SNR for the next sentence was decreased by increasing the level of noise (compared to the sentence), and the task became more difficult. If the sentence was scored as incorrect, the SNR for the next sentence was increased by decreasing the level of noise, thus making the task easier. In the first phase, the SNR was reduced in 10-dB steps following a correct response or increased in 10-dB steps following an incorrect response. In phases 2 and 3, steps of 5 and 2.5 dB were used, respectively. The last step was used for the remainder of the sentences. The SNR average of the last 10 sentences in the list was calculated, which resulted in the SRTn.
For testing SISSS, the same procedure was used as for the U-STARR. The only difference was that the sentences were presented from 60° to the left (-60° azimuth) or to the right (+60° azimuth) of the subject, and the noise was presented from 60° on the opposite side (fig. 1). In the sound localization test, numbers appeared on the screens under the loudspeakers at 0-, ±15-, ±30-, and ±60-degree angles. The phrase ‘Hello what's this?' was randomly presented from one of the loudspeakers above the screens, 30 times in total, at 60, 65, or 70 dB SPL (roving levels). First, the sentence was presented from -60, 0, and +60°. The result was calculated as the percentage of correct responses with a 60-degree angle between loudspeakers. Second, the test was performed with loudspeakers at -60, -30, 0, +30, and +60° to determine the percentage correct with a 30-degree angle between loudspeakers. Lastly, the sentence was presented from loudspeakers at -30, -15, 0, 15, and 30° to determine the percentage correct with a 15-degree angle between loudspeakers.
Again, in the CI group, all tests were performed in three listening conditions: monaurally, with either the CI or the hearing aid switched on, and bimodally, with both the CI and the hearing aid switched on. The participants were instructed to face the loudspeaker positioned straight ahead of them and not to turn their head during the tests. The tests were conducted by 3 individuals according to the protocol.
In order to compare the reliability of the Plomp test and the U-STARR, we repeated these tests on separate days in 12 normal-hearing persons. The VU98 set of sentences is large enough to prevent presenting the same sentence twice.
The data were gathered in Microsoft Excel, and SPSS 20 was used for the statistical analysis. Measures of the ability to understand speech in noise were compared within and between groups with a paired t test or with Student's t test, respectively. The differences were calculated with 95% confidence intervals.
For SISSS testing, subjects usually had one presentation condition in which they performed better than in the other. We compared the results for the best performance condition in the CI group with those for the best performance condition in the normal-hearing group. We also compared the worst performance condition in the same manner.
In the localization test, it was possible to choose the correct source by chance without actually hearing it well, because subjects were asked to choose from a fixed set of options. In order to examine whether subjects performed better (or worse) than at chance level, we used the Wilcoxon signed-rank test.
Sample Size Analysis
The primary outcome was the SRTn measured with the U-STARR as compared to that measured with the Plomp test. For the power analysis, we used test results of the normal-hearing subjects on the conventional Plomp test and the English STARR, as described in the literature. On the conventional Plomp test, a mean of -7.3 dB (SD 1.0) was found by Plomp and Mimpen  when 10 normal-hearing listeners were tested. Boyle et al.  performed the STARR in 25 normal-hearing adults and found a mean SRTn of -5.9 dB (SD 1.3).
To detect a clinically relevant difference of 1.4 dB in SNR between the two tests, with an α of 0.05, a power of 80%, and an SD of 1.2 dB, ≥6 subjects per group would be sufficient.
Twenty-nine normal-hearing subjects participated in this study. Their mean age was 37 years (range 20-66). Their average PTA was 4.8 dB (range -2 to 18) on the right side and 4.3 dB (range -2 to 17) on the left. They all reached 100% speech intelligibility, on average at 51 ± 12.1 dB HL (range 40-70).
Eighteen unilaterally implanted patients participated. Their mean age was 56 years (range 31-70). On average, their hearing impairment started at the age of 26 years (range 2-55), and they were implanted at 50 years of age (range 27-67). The average speech intelligibility at 65 dB HL was 70% (range 50-87) (table 2).
Speech Intelligibility in Noise
In the normal-hearing group, the mean SRTn values of the U-STARR and the Plomp test were -5.6 dB (SD 1.2) and -3.7 dB (SD 1.5), respectively (table 3). The difference was statistically significant (p < 0.01) due to small variances. In the CI group, the mean SRTn of the U-STARR, when only the CI was switched on, was 9.9 dB (SD 4.2) and differed significantly from the mean SRTn of the Plomp test, which was 15.1 dB (SD 7.0; p < 0.01).
Twelve out of the 18 implanted subjects used a contralateral hearing aid. When both the CI and the hearing aid were switched on, the mean SRTn values of the U-STARR and the Plomp test were 10.0 dB (SD 4.7) and 14.0 dB (SD 6.3), respectively (table 3). The difference was statistically significant (p < 0.01). Wearing a contralateral hearing aid did not have an effect on the Plomp or U-STARR test results (p > 0.05). When we tested with only the hearing aid switched on, none of the 12 patients were able to repeat the (key words of the) sentences correctly in silence, let alone in noise. For that reason, an SRTn could not be measured, and a floor effect appeared. For this reason, we did not report the results for listening with a hearing aid only in table 3. The normal-hearing subjects performed significantly better on the Plomp test and the U-STARR than the CI patients (p < 0.01) (table 3).
Twelve subjects underwent the Plomp test and the U-STARR twice on separate days. Although different sentences were presented to them on these occasions, a slight learning effect did occur in both the modified STARR and the Plomp test (table 4).
Speech Intelligibility in Noise with Spatially Separated Sources
Seven normal-hearing subjects performed slightly better when sound came from the right and noise from the left (S +60 N -60). Two subjects performed equally well on both tests, and 20 performed slightly better when sound came from the left and noise from the right (S -60 N +60). The mean SRTn for the best performance condition (S -60 N +60 or S -60 N +60) was -16.3 dB (SD 1.8), and for the worst performance condition it was -14.3 dB (SD 1.8). There was a statistically significant difference in performance between the subjects' best and worst listening conditions (p < 0.01). Again, the variance in the normal-hearing group was small.
For the CI group, when they were wearing only the CI and speech was presented to that side, a mean SRTn of 3.7 dB (SD 4.5) was found. When speech was presented to the contralateral side, a mean SRTn of 18.7 dB (SD 10.4) was found. The results for the best performance condition were clearly better than for the worst performance condition in CI patients (p < 0.01) (table 3).
In the subgroup of 12 contralateral hearing aid users, a mean SRTn of 4.2 dB (SD 3.9) was found when both devices were worn and sound was presented to the CI side. A mean SRTn of 17.3 dB (SD 7.2) was found when sound was presented to the hearing aid side (table 3). Wearing a contralateral hearing aid did not have any effect on the SISSS test results (p > 0.05). Normal-hearing persons performed significantly better on the SISSS test than CI users, irrespective of whether the cochlear implantees used a contralateral hearing aid (p < 0.01).
When sound was presented from different angles, with either 30 or 60° separation, 100% of the normal-hearing subjects were able to distinguish the sound sources perfectly. Only 3 out of the 29 subjects showed minimal mistakes in distinguishing sounds with a 15-degree angle between them (table 3).
The sound localization task was difficult for the CI group. With only the CI switched on and sound presented with inter-loudspeaker angles of 15, 30, and 60°, the mean scores were 18.0% (SD 7.3), 23.9% (SD 8.5), and 40.4% (SD 8.3), respectively. The chance levels in these tests were 20, 20, and 33.3%, respectively. CI users did not perform better than at chance level with angles of 15 and 30° between loudspeakers (p > 0.05). On the localization tests with 60-degree angles, CI patients performed a little better than at chance level (p < 0.01). With both the CI and the hearing aid switched on, in the subgroup of hearing aid users, the results were not better than chance in any of the localization tests (p > 0.01). The mean scores of the tests were 24.8% (SD 11.2), 28.2% (SD 10.7), and 45.8% (SD 14.2) with inter-loudspeaker angles of 15, 30, and 60°, respectively (table 3). Wearing a hearing aid on the contralateral side did not have a positive effect on performance in any of the localization tests of this subgroup (p > 0.05). Normal-hearing persons clearly performed better than CI patients, regardless of whether the CI patients used hearing aids or not.
In a time in which cochlear implantation techniques keep improving and possibilities for sound processing strategies are growing, there is a need for sophisticated hearing tests that are representative of everyday listening situations. The AB-York crescent of sound provides a battery of hearing-in-noise and localization tests that mimic these everyday situations.
We translated the English STARR into Dutch (the U-STARR) for our population. In the present study, the U-STARR has been validated and compared to the conventional Plomp test.
Speech in Noise from Straight Ahead in Normal-Hearing Persons
We have shown that in normal-hearing adults, the U-STARR is adequate and reliable compared to the conventional Plomp test. First, these individuals performed better on the U-STARR because it allowed them to make small mistakes. Nevertheless, subjects who performed well on the Plomp test performed well on the U-STARR and vice versa. Second, the variance in U-STARR results was low; in fact, it was even lower than in the Plomp test results. Third, when repeatedly tested on different occasions, subjects showed similar results. There was a small learning effect, which was equal in the U-STARR and the Plomp test. A similar small learning effect was described for the English STARR and the original Plomp test [Plomp and Mimpen, 1979; Boyle et al., 2013]. Fourth, the Dutch test results were almost identical to the English test results. Boyle et al.  applied the STARR to 25 normal-hearing persons and found a mean SRT of -5.9 dB (SD 1.3). This is similar to the SRT of -5.7 dB (SD 1.3) we found with the Dutch version of this test [Joffo et al., 2010].
Speech in Noise from Straight Ahead in CI Patients
This study also demonstrated that the U-STARR is suitable for measuring speech-in-noise performance in cochlear implantees. Hearing in noise is energy consuming for patients. If they have been deaf for a prolonged period of time, they may also lose the capability to articulate well. Both could result in a poorer Plomp test score. By scoring key words per sentence instead of full sentences, as was customary in the Plomp test, the test has become less demanding. It reduces the number of poor results which are caused by small mistakes that have little influence on actually understanding a sentence correctly. The CI patients included in this study had all been able to hear in the past. For this reason, we were able to get a result for the Plomp test for all patients when they were wearing their CI, without reaching a floor effect. However, 4 patients had a result of >20 dB SNR, which means that the sentences were presented in almost negligible noise [Boyle et al., 2013]. On the U-STARR, only 1 patient had a result of little over 20 dB SNR. Because the U-STARR is more refined than the Plomp test and the variance within a group of CI patients is lower, it seems better suited for studies that investigate subtle differences (for instance, to compare effects of unilateral to bilateral cochlear implantation).
Boyle et al.  applied the STARR to 25 CI users. Although the group was comparable to ours in terms of age, it is not clear whether the subjects had been able to hear in the past. This is very important, since prelingually deafened patients are much more likely to reach a floor effect, which significantly lowers a group outcome. The authors described that 3 patients performed so poorly that an SRTn could not be measured, and they were left out of the study. Another 12 patients reached an SRTn of >20 dB. The group mean of all 22 patients was therefore high: 28 dB (SD 20). The mean of the 10 best-performing patients, who all had SRTn results <20 dB, was 9.4 dB (SD 3). These latter results are comparable to the results for our CI patients (mean 9.9 dB, SD 4.2).
Spatial Listening in Normal-Hearing Persons
In SISSS testing, 20 out of the 29 normal-hearing subjects performed better when speech was presented to the right ear and noise to the left. Twenty-four of these persons were right-handed. This is in line with the idea that signals presented to the right ear have privileged access to language centers in the dominant left hemisphere in right-handed and most left-handed people when competing sounds are presented to both ears [Studdert-Kennedy and Shankweiler, 1981; Van der Haegen et al., 2013]. Five persons were left-handed, 2 of whom performed better with sound from the right and noise from the left. Two performed better with sound from the left and noise from the right, and 1 performed equally well in both situations.
Spatial Listening in CI Patients
With the AB-York crescent of sound, we were able to show that spatial listening was impossible for CI patients with only one implanted ear. The CI patients in our study performed similar to chance levels on the localization tests. This is comparable to the findings of Dunn et al. [2008, 2012], who performed an 8-loudspeaker sound localization test in unilaterally implanted, postlingually deafened adults. Furthermore, it was very difficult to understand speech in noise when speech was presented to the nonimplanted ear and background noise to the implanted ear in the SISSS test.
Finally, we aimed to test all CI users in three listening conditions, but we noticed that several patients did not wear a hearing aid on the contralateral ear out of their own choice because they did not experience any benefits from them. The subjects who did use a hearing aid on the contralateral side did not show any benefits from them in the speech-in-noise or localization tests.
We were able to adequately test speech-in-noise and spatial hearing capabilities in normal-hearing subjects and CI patients with the Dutch version of the AB-York crescent of sound. We validated the U-STARR by measuring a group of normal-hearing listeners and tested its reliability. We also demonstrated that the U-STARR is suitable for measuring speech in noise in severely hearing-impaired subjects and cochlear implantees. It mimics everyday listening situations better than the Plomp test by allowing subjects to use the context of words that are presented in sentences. For the hearing impaired, it is easier to undergo the U-STARR than the Plomp test, since, in the former, they are allowed to make small mistakes and the floor effect is not reached as fast as in the latter. The AB-York crescent of sound is now used in the UK and the Netherlands. Although the English test material was replaced by Dutch sentences, the test results in both languages were similar. The test material could also be used in other countries if it were replaced by sentences in different languages. This would make it possible to compare results between studies more easily in the future.
W. Grolman receives nonrestricted grants from Advanced Bionics, Cochlear, and MedEl.