Abstract
Introduction: Injection laryngoplasty (IL) in combination with short voice therapy (SVT) has been recommended in unilateral vocal fold immobility (UVFI). This pilot study investigated functional voice changes, age, and time-to-treatment effects in UVFI after transoral IL with hyaluronic acid and SVT. Methods: Seventeen adults with UVFI (mean age: 61 years) were retrospectively analyzed. Outcome measures were the Voice Handicap Index 9i (VHI-9i), perceptual Grading-Roughness-Breathiness-Asthenia-Strain (GRBAS) Scale, voice range profiles (VRP) of the speaking, calling, and singing voice, maximum phonation time, jitter, and the Dysphonia Severity Index (DSI). t tests and Wilcoxon tests evaluated treatment effects; age and time-to-treatment effects on the magnitude of change were assessed by Spearman’s correlation. Results: There were significant improvements in VHI-9i and GRBAS scale overall grade of dysphonia, roughness, breathiness, and asthenia. Mean speaking and mean calling sound pressure level (SPL), maximum singing SPL, and mean calling fundamental frequency (f0) increased, while the DSI and jitter improved. Time-to-treatment significantly affected the magnitude of change in mean speaking and maximum singing SPL, singing SPL range, jitter, and DSI; age influenced minimum speaking f0 only. Conclusion: Transoral IL with SVT significantly improves subjective, perceptual, and instrumental acoustic voice outcomes in UVFI. Improvement of speaking and calling VRP after IL has not been previously documented. Our findings suggest that early treatment is beneficial for mean speaking loudness, maximum singing SPL, singing SPL range, jitter, and the DSI. More research is needed to examine the influence of time-to-treatment and age, and also to what extent SVT contributes to treatment effects.
Introduction
Unilateral vocal fold immobility (UVFI) is a frequent diagnosis, affecting five to six percent of patients treated in specialized ear, nose, and throat clinics for dysphonia. The estimated incidence of UVFI is 5.1 cases per 100,000 per year [1]. In contrast to its epidemiology, the etiology of UVFI is well described and includes malignant and benign tumors, surgical and nonsurgical trauma, systemic diseases such as sarcoidosis or tuberculosis, neurological disorders, and idiopathic causes. The common pathophysiology of UVFI is damage to the recurrent laryngeal nerve (RLN), with consecutive paralysis of muscles innervated by the RLN on the ipsilateral side. This results in insufficient glottal closure, leading to typical symptoms, such as dysphonia with a mostly breathy voice, vocal fatigue, and speaking and activity-related dyspnea. Furthermore, dysphagia, along with decreases in general health- and voice-related quality of life, has been observed [2‒5].
Various terms are used to define conditions of impaired or absent vocal fold motion. Unilateral vocal fold paralysis (UVFP) is used in the context of neurogenic etiology, whereas vocal fold motion or mobility impairment is an umbrella term that mainly includes vocal fold paralysis, vocal fold paresis, vocal fold hypomobility, and vocal fold immobility. In contrast to vocal fold paralysis with total immobility of the vocal cords, vocal fold paresis implies only partial impairment of motion. Nevertheless, both originate from neurogenic causes. Vocal fold hypomobility or immobility are more general terms that include mechanical and/or neurogenic disorders [6]. However, these terms are not consistently used in the literature, as a consensus for the definition and diagnosis of UVFP is still missing [5]. This study included patients with UVFI. Therefore, the umbrella term UVFI will be used to describe the loss of vocal fold mobility, excluding patients with hypomobility of the vocal folds.
Treatment Approaches in UVFI
Widely applied treatment options range from conservative methods including voice therapy (VT) to surgical techniques such as injection laryngoplasty (IL). Both aim to reduce glottic insufficiency and functional voice restrictions. However, guidelines to support the decision between conservative and surgical approaches are lacking. Consequently, no consistent approach is currently being followed for treating dysphonia caused by UVFI [5].
Surgical treatment can be divided into four different types of interventions: medialization thyroplasty, IL, arytenoid adduction, and laryngeal reinnervation. To date, previous studies have not demonstrated the superiority of one of these interventions in terms of laryngoscopic findings and subjective, perceptual, acoustic, and aerodynamic outcomes [7‒9]. Thus, the type of intervention should be chosen according to the patient’s functional deficits, needs, general health status, and prognosis [7]. An advantage of IL is that it can generally be performed in an outpatient setting under local or general anesthesia (GA). Office-based procedures show an immediate effect and tend to result in fewer complications than other more invasive procedures such as type 1 thyroplasty [10, 11]. IL can be performed through a transnasal, transoral, or transcervical approach. In the present study, the transoral approach was chosen based on the preferences and expertise of the operating surgeon (J.E.B.). The advantages of transoral IL compared to other injection routes include precise visualization by means of rigid videolaryngostroboscopy in local anesthesia (LA) while the patient has a physiologic muscle tone. This facilitates accurate placement of the injectable substance in a short, manageable time. This approach also allows intraoperative evaluation of the anticipated functional improvements via stroboscopy, favoring LA over GA. However, challenges such as maintaining glottal visualization and ensuring patient cooperation exist, making the approach in LA difficult to learn for a beginner. IL in LA can be particularly demanding in patients with a strong gag reflex or a limited tolerance, so that GA may be required.
Additionally, IL is frequently supplemented with VT. Amongst others, Jeong et al. [12] have shown that combined IL and VT can help maintain improved voice quality for more than 6 months after IL, demonstrating functional differences after treatment longer than the anticipated effect of IL itself. VT included sustained phonation, gliding scale exercises, laryngeal massage, resonance VT, and low-pitched glottal fry phonation in an average amount of 6.3 sessions (range 3–14) [12]. VT is a long-practiced conservative treatment approach that is usually conducted by specialized therapists. It has been described as an effective first-line treatment in patients with UVFI [13], improving subjective vocal symptoms, perceptual voice quality, and objective acoustic voice characteristics. By training compensatory functions, VT can lead to a complete glottal closure in patients with recovering vocal fold motility and persistent immobility [14]. Moreover, VT may prevent patients from developing potentially harmful hyperfunctional compensatory behaviors [15]. An average of 12.6 therapy sessions [14], 15 sessions [13], or 16 sessions [15] were applied in previous studies using VT only in patients with UVFI. One study aimed at enhancing glottal closure and reducing hyperfunctional compensatory behaviors [14], another explored a program targeting vocal range, clarity, and coordination [13], while a third examined early stage interventions incorporating the Smith Accent Method and modified glottal attack exercises [15]. These studies highlight several limitations in literature: No standardized protocols exist, with variations in session frequency, duration, and techniques complicating comparisons. The combined effects of VT and IL are underexplored, and the optimal time from first diagnosis to treatment remains unclear. Furthermore, inconsistencies in outcome measures hinder cross-study comparisons.
In the present study, we address some of the discussed limitations by adding functional outcome data on combined IL and short voice therapy (SVT). The latter consisted of an individually selected set of indirect and direct VT techniques, equipping patients with essential exercises for an effective vocal rehabilitation. In contrast to a stand-alone VT, in the present study SVT aimed to improve the adaptation and voice function during the peri-interventional period. Moreover, the average number of sessions and the duration of each session were shorter compared to common VT protocols, supporting the use of the term “short” VT. Refer to the methods section for more details on the content of SVT.
Voice Diagnostics in Patients with UVFI
For a comprehensive voice examination, the European Laryngological Society (ELS) proposed a basic protocol with five dimensions. This includes a visual examination by videolaryngostroboscopy, perceptual voice assessments, instrumental acoustic voice measurements, standardized self-reported evaluation of symptoms and psychosocial impairment, and aerodynamic measurements [16‒18].
Videolaryngostroboscopy is mainly performed to visually examine the morphology, structural changes, and closure of the vocal folds. Moreover, the vibratory behavior of the vocal folds during phonation is investigated by applying stroboscopic light flashes at a slightly slower frequency than the vibration of the vocal folds plus recording. This results in a sequence of pictures resembling slow-motion videos that are usually assessed for the criteria vocal fold regularity, amplitude, left/right phase symmetry, vertical level, glottal closure pattern, glottal closure duration, and mucosal wave [19].
Audible dysphonia is often assessed using the perceptual Grading-Roughness-Breathiness-Asthenia-Strain (GRBAS) scale. This scale offers a sensitive and reliable method for evaluating voice quality deviation from an anticipated “normal” level [16]. It is composed of five parameters: the overall grade of dysphonia (G) being the maximum of all following single parameters, roughness (R), breathiness (B), asthenia (A), and strained voice quality (S). For each parameter, the examiners assign a number from 0 to 3 depending on the degree of impairment. 0 corresponds to a normophonic voice sound, whereas 1, 2, and 3 refer to slightly, moderately, and severely impaired perceptual voice quality, respectively [20].
Time-based acoustic measures such as jitter (%), shimmer (%), and harmonics-to-noise ratio (HNR) are recommended in voice diagnostics for their ability to provide objective insights into vocal function and thereby treatment effects. However, it is well-recognized that especially jitter and shimmer become less reliable in cases of severe dysphonia due to aperiodic vocal signals. To address this limitation, recent advances advocate for the integration of spectral-based acoustic measures, such as cepstral peak prominence (CPP), which are more robust in objectively analyzing aperiodic signals, while being effective in detecting subtle changes in voice quality. In addition to these measures, voice range profile (VRP)-based parameters such as the mean and range of fundamental frequency (f0) and sound pressure level (SPL) offer valuable objective insights into the dynamic capabilities and function of voice across different phonatory tasks [17, 19]. However, the recently published European update on voice diagnostics proposes to evaluate mean f0, jitter (%), shimmer (%), and noise-to-harmonic ratio only during sustained vowel phonation of /a/ [18]. Despite their limitations, these parameters have been recommended to complement the diagnosis and measure treatment effects.
Aerodynamic voice parameters include maximum phonation time (MPT), phonation quotient, and mean airflow rate during phonation. The MPT is defined as the longest time a vowel can be held after maximum inspiration [21]. Combined indices include several characteristics related to vocal function and are applied as an integrated measure to objectively describe functional restrictions [17]. The Dysphonia Severity Index (DSI) was created to establish an objective and quantitative correlation of perceived vocal quality. It combines MPT, the highest frequency, lowest intensity, and jitter (%) into one formula [22].
Moreover, it is crucial to understand the psychological, social, and communicative effects of voice disorders. Standardized questionnaires assess the different dimensions of subjective symptoms [23]. The most widely applied questionnaire is the Voice Handicap Index (VHI), which assesses voice-related limitations or disabilities in daily life. The original VHI comprises 30 questions (VHI-30), with functional (F), physical (P), and emotional (E) subscales consisting of 10 questions each. Patients answer each question on a scale from zero to four, where zero means never, one means almost never, two means sometimes, three means almost always, and four means always [24].
Several supplementary methods for diagnosing and quantifying voice-related symptoms in patients with UVFI have been described. These include laryngeal electromyography, which is considered the only method to objectively confirm or exclude the diagnosis. It further provides information about the severity of damage to the RLN and the time of injury and distinguishes between isolated injury to the RLN and injury to the entire vagal nerve [25].
Expected Treatment Effects in Patients with UVFI
Despite a growing body of evidence, it still has to be determined which voice outcome measures are most important for describing treatment effects in patients with UVFI. The most frequently applied outcome indicators include video-stroboscopic examination of glottal closure, a wide range of perceptual schemes including the GRBAS scale, the instrumental acoustic measures jitter, shimmer, noise-to-harmonics ratio, and harmonics-to-noise ratio (NHR/HNR), fundamental frequency f0, and the aerodynamic measures mean airflow, MPT, and mean subglottic pressure [26]. As summarized in Table 1, patients suffering from UVFI and being treated with IL using hyaluronic acid (HA) and VT often showed a statistically significant improvement in the glottal gap in videolaryngostroboscopy, the Voice Handicap Index (VHI), perceptual GRBAS scale, MPT, and acoustic voice parameters such as jitter, shimmer, singing voice f0, and intensity range.
A review of recent (<10 years) literature on voice outcomes after IL with HA in patients with UVFI
Reference (year) . | Treatment . | n . | Diagnostic parameter . | Time from IL to follow-up . | Outcomes (p < 0.05 was considered statistically significant) . |
---|---|---|---|---|---|
Jeong et al. [12] (2022) | Transcervical IL with HA alone (IL) versus combined VT and IL with HA (IL + VT) | 40 |
| 7 months (IL) versus 11 months (IL + VT) | Statistically significant differences in 1, 2 (IL + VT: G, R, B, A; IL: G), 3 (IL + VT), 5 (IL + VT) |
Kim et al. [27] (2018) | Transcervical IL with HA | 50 |
| 1 month | Statistically significant differences in 1, 2, 3, 4 (glottal gap), 5 (AVQI, CPPS) |
Pei et al. [28] (2018) | Transcervical IL with HA | 68 |
| 1 month | Statistically significant differences in 3, 5 (max/min f0, range of f0/semitones/SPL, max dB, f0 at max/min intensity), 6 |
Gotxi-Erezuma et al. [29] (2017) | Transcervical IL with HA | 28 |
| 15 days versus 6 months | Statistically significant differences in 1, 2 (G, A, and B without R and S), and 3 (15 days and 6 months) |
Reference (year) . | Treatment . | n . | Diagnostic parameter . | Time from IL to follow-up . | Outcomes (p < 0.05 was considered statistically significant) . |
---|---|---|---|---|---|
Jeong et al. [12] (2022) | Transcervical IL with HA alone (IL) versus combined VT and IL with HA (IL + VT) | 40 |
| 7 months (IL) versus 11 months (IL + VT) | Statistically significant differences in 1, 2 (IL + VT: G, R, B, A; IL: G), 3 (IL + VT), 5 (IL + VT) |
Kim et al. [27] (2018) | Transcervical IL with HA | 50 |
| 1 month | Statistically significant differences in 1, 2, 3, 4 (glottal gap), 5 (AVQI, CPPS) |
Pei et al. [28] (2018) | Transcervical IL with HA | 68 |
| 1 month | Statistically significant differences in 3, 5 (max/min f0, range of f0/semitones/SPL, max dB, f0 at max/min intensity), 6 |
Gotxi-Erezuma et al. [29] (2017) | Transcervical IL with HA | 28 |
| 15 days versus 6 months | Statistically significant differences in 1, 2 (G, A, and B without R and S), and 3 (15 days and 6 months) |
The numbers in the column “diagnostic parameter” correspond to the following diagnostic dimensions: 1: subjective; 2: perceptual; 3: aerodynamic; 4: visual; 5: acoustic analysis; 6: others.
VT, voice therapy; IL, injection laryngoplasty; HA, hyaluronic acid; VHI, Voice Handicap Index; MPT, maximum phonation time; CAPE-V, Consensus Auditory-Perceptual Evaluation of Voice; AVQI, Acoustic Voice Quality Index; CPPS, cepstral peak prominence-smoothed; G, grading; R, roughness; B, breathiness; A, asthenia; S, Strain.
So far, singing VRP parameters, such as the f0 and SPL range, have never been used as outcome indicators following transoral IL in patients with UVFI. However, those parameters are sensitive to functional voice changes after VT [30], transcervical IL with HA [26, 28], as well as transoral IL with HA in patients with glottic insufficiency [31]. Similarly, up to date speaking voice characteristics have not been investigated following transoral IL. However, two studies examining speaking voice characteristics of a Cantonese-speaking [32] and Korean-speaking population [27] after transcervical IL with HA, reported no significant improvement. Other studies have analyzed the speaking voice following other treatment procedures such as vocal fold medialization with hydroxyapatite or titanium implants. Both Storck et al. [33] and Schneider-Stickler et al. [34] showed an increase in speaking SPL without changes in f0 [35]. In contrast, another study by Schneider-Stickler et al. [34] did not observe any changes in speaking VRP characteristics after vocal fold medialization with a titanium implant [36]. However, an increased SPL during calling was reported after treatment with the same implant [34, 36]. Despite the scarce literature investigating treatment effects on the speaking voice, objectively measurable characteristics, especially mean speaking voice intensity, substantially impact voice function in daily life. In particular, many professions, such as teachers and salespeople, require prolonged use of elevated speaking voice levels [37].
Age and Time-to-Treatment Effects
A more in-depth analysis shows that the majority of studies do not consider further relevant factors, such as age or the time between initial diagnosis and start of treatment [37]. While vocally healthy men develop an increasing speaking fundamental frequency (f0) with age, f0 tends to decrease in women [38]. Similar aging effects are likely in patients with UVFI but have yet to be considered in outcome studies. Furthermore, men over 60 years of age with UVFI have been shown to exhibit worse perceptual breathiness and overall dysphonia scores than women, pointing toward underlying sex-related differences [39]. Therefore, the present work aimed to evaluate the influence of age and time-to-treatment on the anticipated treatment outcomes.
We hypothesized that younger patients may exhibit more favorable outcomes from treatment, which can be attributed to vocal fold characteristics, such as greater elasticity and healing capabilities and a higher potential for neuroplasticity. These factors, combined with a typically better general health related state, may lead to a faster recovery after surgery in young patients.
Moreover, data regarding how soon after the initial diagnosis treatment should occur remains sparse. From a functional viewpoint, early treatment might be beneficial as it prevents patients from adopting harmful voice production patterns and vocal fold muscle atrophy, thus facilitating more effective voice restoration. This has been suggested by a meta-analysis of the effects of IL, which demonstrated that early treatment improves outcomes and reduces the need for permanent medialization surgery [40]. Furthermore, a study showed that the length of hospital stay after injury to the RLN in aortic repair could be significantly reduced in cases of an early IL [41]. A systematic review evaluating the type and timing of therapy for vocal fold paresis/paralysis after thyroidectomy found no significant clinical heterogeneity between treatment at <6 months and 6–12 months but between injections at under 12 and over 12 months. This led to the recommendation to carry out IL in the first 12 months after thyroidectomy [42].
Aim of the Study
While previous research has focused on functional voice outcomes after transcervical IL with HA or other materials, our study extends this knowledge by examining outcomes after transoral IL with HA combined with SVT. Therefore, the present work aims to provide new evidence, in line with clinical assessment guidelines, such as those proposed by the ELS. This contributes to an evidence-based foundation for treatment decision making. Moreover, to the best of our knowledge, this study is the first to investigate the effects of time-to-treatment and age on the magnitude of change in functional voice outcomes following IL in patients with UVFI. In summary, the present study aims to answer the following questions: (a) How do clinical subjective, perceptual, aerodynamic, and instrumental acoustic voice outcomes change in patients with UVFI treated with combined transoral IL using HA and SVT and (b) Is the magnitude of change in these outcome measures affected by age and time-to-treatment?
Materials and Methods
Data Collection
Data were obtained from adult patients diagnosed and treated or followed up at the Department of Phoniatrics and Speech Pathology, University Hospital Zurich, between 1 January 2016 and 31 August 2018. All patients were examined during standard clinical assessment following the European Laryngological Society (ELS) 2001 assessment protocol, which includes visual, subjective, perceptual, aerodynamic, and instrumental acoustic examinations [17]. The responsible Ethics Committee approved this study under reference number KEK ZH 2017-00806.
Inclusion Criteria
Included were patients with UVFI, as described in the introduction section. All patients underwent a baseline voice assessment and were treated by transoral IL with HA combined with SVT. A follow-up voice assessment had to be available in a time period of 1 to 8 months after therapy. Both before and after treatment, outcome measures had to include at least an instrumental acoustic analysis and a perceptual voice analysis using the GRBAS scale. The inclusion criteria were as follows:
Signed general consent for further use of data
Ability to fill in questionnaires by themselves
Age between 18 and 99 years
No moderate to high-grade hearing disorder
No systemic neurological disorders, such as multiple sclerosis, related to UVFI or dysphagia
No known dysphagia before UVFI
No invasive ear, nose, and throat tumors or neoplasia infiltrating laryngeal structures, except the laryngeal nerve
No congenital or infectious esophageal diseases, no esophageal tumors
No prior irradiation therapy of the neck
Patient Selection
Eligible patients who met the above-mentioned criteria were identified retrospectively using the clinical database of the Division of Phoniatrics and Speech Pathology at the University Hospital Zurich. The search was conducted using German keywords such as “stillst,” “Lähmung,” “Paralyse,” “unilateral,” “recurrens,” and “Augmentation.” With this process, initially 246 patients diagnosed with UVFI were identified. After applying the inclusion criteria, 229 patients (93.1%) were excluded, resulting in 17 patients (6.9%) eligible for the final analysis. The following patient data were consecutively obtained from the clinical information system of the University Hospital Zurich: age, diagnosis, treatment modality, treatment dates (SVT and IL), number of SVT sessions, date of first diagnosis, time-to-treatment (time between initial diagnosis and start of SVT), and the voice parameters described below. This information was transferred to an Excel file (Microsoft Corporation, Seattle, WA, USA) for further processing.
Patient Characteristics
Thirteen men and 4 women between the ages of 37 and 77 years, with an overall mean age of 60 years (standard deviation [SD] 13.1; median 62.0) were included. The mean age of the female participants was 64 years (SD 10.8; median 61.5, range 55–77) and of the male participants 59 years (SD 14.0; median 62, range 37–77). There was no significant difference in age distribution between the sex groups (p = 0.5, exact Mann-Whitney U test).
A mean time-to-treatment of 13.7 months (SD 28.9; median 2, range 0–100) was observed. Outcome measures, including the VHI, GRBAS, and the VRP, were simultaneously collected on average 4.4 months after IL (SD: 1.6; median: 4; range: 2–8 months). The mean number of SVT sessions was 2 (SD 1.3; median 2, range 1–3). Refer to Table 2 for an overview of patient characteristics.
Patient characteristics
Parameter . | ∅ . | SD . | Minimum/maximum . | Median . |
---|---|---|---|---|
Age, years | 60.0 | 13.1 | 37/77 | 62 |
Time from first diagnosis to treatment, months | 13.7 | 28.9 | 0/100 | 2 |
Time from IL to collection of outcome measures, months | 4.4 | 1.6 | 2/8 | 4 |
Number of SVT sessions | 2 | 1.3 | 1/3 | 2 |
Parameter . | ∅ . | SD . | Minimum/maximum . | Median . |
---|---|---|---|---|
Age, years | 60.0 | 13.1 | 37/77 | 62 |
Time from first diagnosis to treatment, months | 13.7 | 28.9 | 0/100 | 2 |
Time from IL to collection of outcome measures, months | 4.4 | 1.6 | 2/8 | 4 |
Number of SVT sessions | 2 | 1.3 | 1/3 | 2 |
Etiology
Of the 17 patients included, 12 (70.6%) had developed UVFI as a result of postoperative complications. Two patients (11.8%) had received anterior cervical spine surgery with discectomy and fusion at the C6-C7 level for radicular pain syndrome caused by a herniated disc. An additional 2 patients (11.8%) had required aortic arc replacement for type A aortic dissection. Seven patients (41.2%) had been treated for pulmonary pathologies. These had included thoracoscopic wedge resection of the lingula and left upper lobe with mediastinal lymphadenectomy for adenocarcinoma of the lingula (TNM stage: pT1b pN1 cM0), open left lower lobectomy with mediastinal lymphadenectomy for metastatic renal cell carcinoma (TNM stage: pT1b pN1 cM1), and open mediastinal lymphadenectomy and wedge resection of the left upper lobe for adenocarcinoma of the left upper lobe (TNM stage: pT4 pN2 cM1). Other treatments had involved open left-sided pleurectomy, pneumonectomy, and lymphadenectomy for metastatic thymic carcinoma (TNM stage: pT4 pN2 cM1), as well as open left pneumonectomy and mediastinal lymphadenectomy for squamous cell carcinoma (TNM stage: pT4 pN1 cM0). Two additional patients had undergone pleurectomy, pericardectomy, and mediastinal lymphadenectomy for pleural mesothelioma (TNM stages: ypT3 ypN0 cM0 and pT2 pN0 cM0). Lastly, 1 patient had required total thyroidectomy with RLN resection and bilateral neck dissection involving the upper mediastinum for papillary thyroid carcinoma (TNM stage: cT4 cN1 cMx). The remaining 5 patients (29.4%) had been diagnosed with idiopathic UVFI, as no identifiable cause could be determined.
Medical Background
Of the 17 patients included, 8 (47.1%) had medical comorbidities, while 9 (52.9%) had no relevant medical history. The observed comorbidities included heart failure; myasthenia gravis combined with liver cirrhosis; peripheral artery disease, arterial hypertension, and alcoholism; COPD grade II with obstructive sleep apnea syndrome and a history of stenting in the right coronary artery; bicuspid aortic valve; arterial hypertension; metabolic syndrome; and atrial fibrillation combined with arterial hypertension.
Laryngoscopic Severity and Findings
All 17 patients presented with complete immobility of the affected vocal fold. Upon laryngeal examination, 13 patients (76.5%) were found to have incomplete glottic closure, whereas the remaining 4 patients (23.5%) showed complete glottic closure. UVFI was left-sided in 10 patients (58.8%) and right-sided in 7 patients (41.2%).
Treatment: IL plus SVT
Vocal fold augmentation was performed via a transoral approach under LA or GA when a strong gag reflex was present. The amount of HA injected was individually adjusted for each patient. All augmentations were performed by an experienced laryngologist, with the assistance of a medical doctor. In 15 patients, IL was carried out as an office-based laryngeal surgery procedure with patients sitting awake in the examination chair; after a rigid laryngostroboscopic examination using a XION EndoSTROB (XION GmbH, Berlin, Germany), surface anesthesia was applied to the oral cavity, the tongue base, the velum and the posterior wall of the pharynx, using a Xylocaine Spray 10% and Buprocaine 1%. Subsequently, the endolarynx and vocal folds were anesthetized. Once anesthesia was completed, HA (Restylane®; Galderma, SA Lausanne, Switzerland) was transorally injected laterally to the M. vocalis of the immobile vocal fold under a constant endoscopic view. The injected volume ranged from 0.4 mL to 1.6 mL, with a mean of 0.8 mL (median of 0.7 mL). The medialization effect was controlled directly after injection and 30 min later by visual and perceptual testing of voice production.
Due to a strong gag reflex, GA was required in 2 patients. After jet ventilation and exposure of the vocal folds with a Kleinsasser laryngoscope (KARL STORZ, Tuttlingen, Germany), HA was injected laterally to the M. vocalis of the immobile vocal fold. Endoscopic and perceptual control of voice function were performed the following day.
The mean number of total SVT sessions was 2 (median 2, range: 1–3). SVT was performed in all patients before IL, with a mean duration from the first session of SVT to IL of 4.8 months (median 3 months, range: 1–15 months) and an average of 1.7 sessions (median: 2 sessions; range: 1–3 sessions). In 5 patients, SVT was continued for an average time of 2.6 months (median 2, range: 2–4 months) after IL and an average of 1 session (median: 1; range: 1-1 session), occurring after a mean time of 1.8 months (median: 1; range: 1–3 session). Refer to Table 2 for an overview of patient characteristics. The specific set of SVT techniques was not standardized across all patients but was tailored on a case-by-case basis to meet individual needs and address specific functional deficits identified during initial assessments. It was based on the patients’ symptoms, functional status, and needs. For this, subjective symptoms (such as vocal tract discomfort), as well as perceptual, acoustic, and aerodynamic findings, were discussed and prioritized with the patient. SVT consisted of sessions of approximately 30–45 min, conducted by a team of experienced speech therapists at the University Hospital Zurich. It included indirect techniques such as advice on vocal hygiene (e.g., avoidance of throat clearing or hydration to treat vocal tract discomfort) and management of voice behavior during vocally demanding situations (such as positioning in the room or the use of microphones). Direct VT techniques to facilitate efficient voice production [43, 44], in terms of improving perceptual voice quality, mean speaking voice intensity, and fundamental frequency variation, included, among others, vocal function exercises [45], semi-occluded vocal tract exercises, as well as manual techniques to relax the (para)laryngeal and cervical musculature. Moreover, exercises to promote respiratory control and resonant equilibrium from classical VT approaches, as well as semi-occluded vocal tract exercises, were applied [46]. In the event that the patient and speech therapist considered follow-ups useful, further appointments were scheduled. Patients were discharged after a final comprehensive voice assessment [47].
Analyzed Parameters
Patient-Based Questionnaire
In the present study, a shortened version of the VHI, including nine selected questions, was used. The Voice Handicap Index International 9 (VHI-9i) is a good approximation of the original version [48]. In the VHI-9i, a total score of 36 points is possible, with 36 indicating maximal impairment and 0 indicating no impairment [23, 49].
Perceptual Outcomes
Perceptual voice quality was analyzed by an experienced speech therapist or a phoniatrician during anamnesis, before and after therapy. The GRBAS was used to classify the degree of dysphonia. To improve inter-rater reliability, clinicians regularly underwent training using standardized voice recordings [47].
Instrumental Acoustic Outcomes
Instrumental acoustic assessments were performed using DIVAS (XION GmbH, Berlin, Germany) analysis hardware and software. For this, the XION microphone headset (XION GmbH, Berlin, Germany) and a personal computer equipped with DIVAS software were used [50]. All recordings were based on recommended protocols for instrumental assessment of voice [19] and guidelines for the measurement of SPL in voice and speech [51]. For SPL recordings, calibrated hardware and software as provided by the manufacturer were used (XION, GmbH, Berlin, Germany). The microphone was head-mounted with a 30 cm recording distance to the mouth. Acoustic samples were recorded with a sampling rate of 22 kHz. All recordings were made in normal room acoustics with the windows closed. No signal typing was performed.
For the speaking VRP, patients were asked the following question (translated from German): “Can you please tell me how you came here today?” For the counting VRP, participants were given the following prompt: “Please count from 20 to 30 with comfortable voice intensity.” For the calling VRP, patients were asked: “Please call as loudly as you can: Hello Anton, come here! Three times.” For the singing VRP, they were instructed as follows: “Please sing the syllable “la” as softly as you can, from a medium pitch downwards, three times. Please sing the syllable “la” as softly as you can upwards, three times.” These instructions were then repeated, adding “as loudly as you can.”
The analysis of these VRP recordings included the following parameters: mean voice f0 (Hz) and SPL (dBA) of the speaking, counting, and calling voice profiles; minimum, maximum, and ranges of SPL and f0 (semitones) of the speaking and singing VRP, jitter (%), and the Dysphonia Severity Index (DSI). For jitter analysis, a middle sequence of the sustained vowel /a/ was used to exclude the increased variability of the voice onset and offset phases [47].
Aerodynamic Measurements
To measure the MPT, patients were instructed to “Please hold the vowel /a/ as long as possible at medium sound pressure and pitch after maximum inspiration.” This was repeated three times, and the longest MPT was considered for analysis [21].
Statistical Analysis
Statistical analyses were performed using SPSS Statistics (Version 29; IBM Corp., Armonk, NY, USA). First, the mean, median, SD, and minimum and maximum values were calculated for each parameter. Then, the normal distribution was assessed using the mean of the Q-Q diagrams and the Shapiro-Wilk test.
To investigate the statistical significance of treatment effects, the paired t test was applied where the metric data showed a normal distribution. If the conditions for the t test were not met, the Wilcoxon signed rank test was used. The effects of age and time-to-treatment on the magnitude of change in outcome measures were assessed using Spearman’s rank correlation, as the data were not normally distributed. Statistical significance was defined as a p value <0.05.
The effect size of treatment effects was measured using Pearson’s correlation coefficient r. Depending on the data distribution, two different formulas were employed. For non-normally distributed data, the effect size was calculated using the formula . Conversely, for normally distributed data, the effect size was determined by applying .
Finally, a post hoc power analysis was performed. G*Power was used to determine both the achieved power and the required sample size [52]. We opted for a two-tailed test, set an alpha error probability of 0.05, aimed at a power of 0.8, and calculated the effect size, dz from the study data. The chosen test was based on the normality of data distribution. When the data did not follow a normal distribution, the Wilcoxon signed rank test was applied. For data showing a normal distribution, the analysis was based on a dependent t test. Refer to Table 3, as well as online supplementary Tables 1 and 2 (for all online suppl. material, see https://doi.org/10.1159/000544718) for a detailed breakdown of the normality of data distribution, the tests used, and the power analysis for each outcome measure.
Overview of GRBAS scale scores before and after therapy
Para-meter . | n . | Before therapy . | After therapy . | p value . | Effect size, r . | Required sample size (power) . | ||||
---|---|---|---|---|---|---|---|---|---|---|
mean (SD) . | minimum/maximum . | median . | mean (SD) . | minimum/maximum . | median . | |||||
G | 17 | 1.9 (0.9) | 0/3 | 2 | 1.1 (0.8) | 0/3 | 1 | 0.004* | −0.70 | 16 (0.83) |
R | 17 | 1.7 (0.9) | 0/3 | 2 | 1.1 (0.8) | 0/3 | 1 | 0.018* | −0.58 | 27 (0.59) |
B | 17 | 1.7 (0.8) | 0/3 | 2 | 0.8 (0.6) | 0/2 | 1 | 0.004* | −0.70 | 16 (0.83) |
A | 16 | 1.3 (1.1) | 0/3 | 1.5 | 0.4 (0.6) | 0/2 | 0 | 0.010* | −0.62 | 23 (0.66) |
S | 16 | 0.8 (0.8) | 0/3 | 1 | 0.6 (0.7) | 0/2 | 0 | 0.453 | −0.18 | 212 (0.12) |
Para-meter . | n . | Before therapy . | After therapy . | p value . | Effect size, r . | Required sample size (power) . | ||||
---|---|---|---|---|---|---|---|---|---|---|
mean (SD) . | minimum/maximum . | median . | mean (SD) . | minimum/maximum . | median . | |||||
G | 17 | 1.9 (0.9) | 0/3 | 2 | 1.1 (0.8) | 0/3 | 1 | 0.004* | −0.70 | 16 (0.83) |
R | 17 | 1.7 (0.9) | 0/3 | 2 | 1.1 (0.8) | 0/3 | 1 | 0.018* | −0.58 | 27 (0.59) |
B | 17 | 1.7 (0.8) | 0/3 | 2 | 0.8 (0.6) | 0/2 | 1 | 0.004* | −0.70 | 16 (0.83) |
A | 16 | 1.3 (1.1) | 0/3 | 1.5 | 0.4 (0.6) | 0/2 | 0 | 0.010* | −0.62 | 23 (0.66) |
S | 16 | 0.8 (0.8) | 0/3 | 1 | 0.6 (0.7) | 0/2 | 0 | 0.453 | −0.18 | 212 (0.12) |
Given the data were not normally distributed; the Wilcoxon-signed ranks test was used to determine statistical significance, with results marked by asterisks for p values less than 0.05. The effect size according to Cohen [53] for G, R, B, and A corresponds to a strong effect (Pearson’s r > 0.4).
n, sample size; SD, standard deviation.
Results
Treatment Effects
Voice Handicap Index
A total of 12 patients had complete Voice Handicap Index (VHI) records. There was a highly significant reduction in the overall VHI-9i sum score from 25.1 (SD: 5.9) to 12.3 (SD: 8.0, p < 0.01). A large effect size (r = 0.94) was observed [53], indicating a robust treatment effect. Notably, all subscales showed a significant treatment effect. Please refer to online supplementary Table 1 for a detailed breakdown of results including the effect size, power analysis and normality of data distribution.
Perceptual GRBAS Scale
Sixteen patients had complete GRBAS scale records before and after therapy. For 1 patient, only RBH-scale results were available. Therefore, in this patient the parameters Asthenia and Strain were missing. There was a statistically significant treatment effect in the overall grade of dysphonia (G) and the sub-characteristics roughness (R), breathiness (B), and asthenia (A) (shown in Fig. 1). G improved from a mean of 1.9 (SD: 0.9) to 1.1 (SD: 0.8, p = 0.004), R improved from 1.7 (SD: 0.9) to 1.0 (SD: 0.5, p = 0.02), B improved from 1.7 (SD: 0.8) to 0.8 (SD: 0.6, p = 0.004), A improved from 1.3 (SD: 1.1) to 0.4 (SD: 0.6, p = 0.01), and S from 0.8 (SD: 0.8) to 0.6 (SD: 0.7, p = 0.5). Table 3 shows all GRBAS scale scores before and after therapy, including the effect size, power analysis, and normality of data distribution.
Overall dysphonia (G), roughness (R), breathiness (B), asthenia (A), and Strain (S) before and after treatment. Statistically significant results are marked with asterisks (p < 0.05) (source: own). CI, confidence interval.
Overall dysphonia (G), roughness (R), breathiness (B), asthenia (A), and Strain (S) before and after treatment. Statistically significant results are marked with asterisks (p < 0.05) (source: own). CI, confidence interval.
Instrumental Acoustic and Aerodynamic Assessment
VRP, acoustic perturbation, and aerodynamic measurements before and after therapy were performed in 17 patients. However, in some patients, the VRP recordings were incomplete; given the retrospective design of the study, the reasons for this could not be determined. Refer to online supplementary Table 2 for a detailed breakdown.
For speaking voice characteristics, the analysis showed a higher mean speaking voice SPL, which changed from 61.9 to 64.1 (dBA) (p = 0.04). Mean calling voice SPL and f0 increased from 79.5 to 88.0 (dBA) (p = 0.001), and 197.0 to 239.5 (Hz) (p = 0.03), respectively. There was a higher singing voice range (change from 25.8 to 34.9 (dBA), p = 0.003) and maximum singing voice SPL (increase from 78.7 to 86.1 [dBA], p = 0.012) (shown in Fig. 2). Jitter decreased from 7.3 to 1.4 (p = 0.028), and the Dysphonia Severity Index (DSI) increased from −7.6 to −0.1 (p = 0.033).
Improvement of mean speaking, mean calling, maximum singing voice SPL, and range after treatment; statistically significant results are marked with asterisks (p < 0.05) (source: own).
Improvement of mean speaking, mean calling, maximum singing voice SPL, and range after treatment; statistically significant results are marked with asterisks (p < 0.05) (source: own).
No significant changes were observed for any of the other instrumental acoustic parameters investigated. Refer to online supplementary Table 2 for a detailed breakdown of results including effect size, power analysis, and normality of data distribution.
Effects of Age and Time-To-Treatment
Age
Age significantly affected the magnitude of change in minimum speaking fundamental frequency (f0) (coefficient: −0.5; p = 0.03): Younger patients had a greater change toward a lower speaking f0. However, age did not significantly affect any of the other investigated parameters. Due to the non-normal distribution of data, Spearman’s rank correlation was used.
Time-to-Treatment
A shorter time-to-treatment correlated with a greater magnitude of change in mean speaking voice SPL (coefficient: −0.6; p = 0.022), and maximum singing voice SPL (coefficient: −0.6; p = 0.007) and range (coefficient: −0.5; p = 0.047). Also, the magnitude of change in jitter (coefficient: −0.5, p = 0.046) and the DSI (coefficient: −0.7, p = 0.006) was greater when patients were treated earlier. Due to the nonnormal distribution of the data, Spearman’s rank correlation was used for all analyses.
Discussion
The present study in patients with UVFI shows significant improvements in perceptual dysphonia and subjective voice symptoms after treatment with transoral IL combined with SVT. Moreover, mean speaking voice loudness, mean calling fundamental frequency (f0), and calling SPL show functionally relevant voice changes after treatment. While age affected minimum f0 only, the present preliminary findings suggest that early treatment is especially beneficial for habitual mean speaking loudness, and thus, may be particularly important for professional voice users. Because the maximum singing SPL, singing SPL range, jitter, and the DSI are also affected by the time-to-treatment, these should be considered when anticipating the individual functional outcome. To the best of our knowledge, the present work is the first study focusing on functional outcomes in patients undergoing transoral IL combined with SVT providing comprehensive data on subjective, perceptual, aerodynamic, and instrumental acoustic outcomes [12, 27‒29, 31, 32, 54‒57]. Especially, the present work provides new clinical insights into the treatment effects on speaking and calling VRP characteristics, which are key to an efficient daily voice use.
Subjective Voice-Related Symptoms
In our study, we observed a statistically significant decrease in VHI-9i sum scores after therapy, with a large effect size [53]. Most notably, this shift in scores represents a transition from moderate to mild impairment [49], suggesting a clinically meaningful improvement. This emphasizes the importance of IL for improving subjective perception of voice complaints. While similar changes have been shown after transcervical IL [54], so far this has not been documented for the transoral technique. It is also worth noting that previous studies typically utilized a longer VHI-10 [54] or VHI-30 questionnaire [29, 57, 58]. The results of the present study are broadly consistent with these findings and suggest that the shorter VHI-9i is an adequate tool for assessing voice-related symptoms. The use of a shorter questionnaire version potentially simplifies the assessment process in this patient group.
Improvement of Perceptual Voice Characteristics
In the present study, a statistically significant improvement with a moderate to large effect size was observed in the perceptual characteristics overall grading (G), roughness (R), breathiness (B), and asthenia (A). The highly significant p values (e.g., p = 0.004 for GRBAS G) can be attributed to the consistency of changes across patients and the large effect sizes (e.g., r = 0.70 for GRBAS G). Nevertheless, the small sample size restricts the generalizability of these results and highlights the need for confirmation in larger cohorts. We considered the treatment effects to be clinically relevant and meaningful, as G, R, and B showed a median of 2 (moderate disturbance) before therapy, a median of 1 (mild disturbance) after therapy, and A a median of 1.5 before and 0 (no disturbance) after therapy. These results are consistent with several other studies showing an improved G, R, B, and A after IL with HA using a transcervical approach [56]. The missing effect of Strain in the present study could be due to the generally low preoperative Strain score, which resembled the values observed in healthy individuals. This partly agrees with the existing literature, where one study indicated an improved S, whereas another failed to demonstrate such an effect [56].
Changes in Objective Acoustic Voice Parameters
Speaking and Calling Voice Characteristics
In the present study of 17 individuals, we found a statistically significant change with large effect size [53] in mean speaking loudness and mean calling voice f0, and SPL. Similarly, vocal fold medialization with hydroxyapatite or titanium implants led to significantly improved mean speaking and calling loudness [33‒36]. In contrast, two studies examined the characteristics of speaking voice f0 after transcervical IL with HA and did not observe any changes in minimum, maximum, or mean f0 [27, 32]. To the best of our knowledge, a statistically significant increase in mean speaking loudness, calling voice f0 and SPL has not been previously demonstrated following IL with any of the available materials for patients with UVFI.
Singing Voice Characteristics
In the present cohort, a statistically significant improvement, with a large effect size for the maximum singing SPL and SPL range, was observed. However, minimum and maximum singing f0,f0 range, and minimum singing SPL did not change significantly. Pei et al. [28]’s recent research also evaluated the singing VRP in patients with UVFP. In contrast to our study, IL was performed transcervically and earlier after diagnosis. Moreover, it did not include SVT, and follow-up was conducted earlier. Their results were partially consistent with our findings, for instance Pei et al. [28] also reported significant improvements in the maximum singing SPL and range. Furthermore, although our results did not show statistical significance for the maximum f0 and semitone ranges, the effect sizes observed for these parameters were comparable. Contrary to the results of Pei et al. [28], our study found a negligible effect size for the minimum singing f0, indicating a divergence in the outcomes. Jeong et al. [12] examined VRP after transcervical IL and compared patients who received adjuvant VT with those who only underwent IL. They observed an improvement in f0 and SPL range after IL with HA and adjuvant VT but not in patients receiving IL alone. In addition, VT helps maintain the results for a longer time [12]. This suggests that SVT also provide additional complementary benefits.
DSI and Acoustic Perturbation
There was a significant improvement in DSI, with a large effect size. This is in line with previous studies that reported improved DSI after transoral IL with HA [31]. In agreement with previous studies, jitter improved significantly [12, 59]. To the best of our knowledge, a significant treatment effect in jitter using the transoral IL technique has not previously been reported.
Aerodynamic Measurements
Several studies have shown improved MPT after transoral or transcervical IL with HA [12, 27, 29, 54, 55, 59‒62]. Although no statistically significant change was detected in MPT in the t test in the present work, the observed strong effect size suggests that significant findings may be observed with a larger sample. Similarly, the significant Wilcoxon signed rank test (p = 0.038) result suggests a meaningful effect that could be detected with a larger sample size.
Choice of Outcome Measures
The choice of outcome measures was guided by current recommendations in the European Laryngological Society’s guidelines for multidimensional voice assessment [17]. They are widely applied in voice research, as also highlighted by Carding et al. [16], who emphasized their role in objectively capturing changes in voice function. SPL, for instance, captures dynamic voice capabilities, which are critical for functional communication in patients with UVFI. Despite their relevance, these measures also have limitations. For example, while some acoustic measures provide robust and relevant quantitative data, they may not fully address the subjective and perceptual aspects of voice disorders. This limitation underscores the importance of combining them with subjective tools like the Voice Handicap Index (VHI) and perceptual evaluations such as the GRBAS scale to obtain a more complete view of treatment outcomes [17]. Moreover, several acoustic measures may not be usefully applicable in severely dysphonic voices, and mostly all so-called traditional measures such as jitter, shimmer, HNR, and CPP are affected by speaking voice SPL, F0, and speech material type. However, to date there is no consensus on which voice outcome indicators should be used to compare the benefits of the various surgical treatments for UVFI [26]. Thus, in the present work current standards were adopted and reported to provide comprehensive assessment data.
Influence of Age and Time-To-Treatment
In the present study, a short time-to-treatment was associated with better results; it correlated with several better instrumental acoustic outcomes, including mean speaking voice SPL, maximum singing voice SPL, singing voice range SPL, and jitter (%). This supports that early treatment might help prevent patients from adopting harmful voice production patterns [40, 63‒68].
Also, age affected the minimum speaking f0. Thus, younger patients may obtain better functional voice results than older patients. This might be due to several factors such as changes in vocal fold characteristics and neuroplasticity, which have been shown to lead to changes in speaking f0 during adulthood [38].
To the best of our knowledge, the effects of aging and time-to-treatment on functional voice outcomes after any of the four surgical approaches for patients with UVFI have not yet been investigated. Notably, a correlation between early treatment and improved mean speaking voice SPL has not been demonstrated so far. Since an inadequate speaking voice SPL is a primary concern for many patients, this finding should be considered when advising patients about the timing of an intervention. This might be an argument for early treatment by IL, particularly for professional voice users. However, this should be confirmed by a more extensive clinical study.
Feasibility, Indications, Contraindications, and Technical Challenges of IL
Transoral IL, either under local (LA) or GA, is normally feasible in all patients after excluding contraindications. As confirmed in this study, functional outcomes are comparable to the other more common approaches. We generally favor an office-based approach under LA, as represented in the present study (n = 15, 88% of patients). Advantages include a precise visualization by means of rigid videolaryngostroboscopy, the ability to directly assess functional outcomes, its minimally invasive nature, short surgery time, and cost-efficiency. Transoral IL in LA is suitable for cooperative patients without a strong gag reflex, which can be already excluded during preassessments after current standards [18, 19]. In our institution, a first laryngeal assessment includes primarily rigid and secondarily flexible endoscopic examinations. Thus, the suitability for IL in LA can be determined during the first patient contact. Indications for IL include mild-to-moderate glottic insufficiency, as in UVFI, and therefore are recommendable in a substantial proportion of patients.
Procedures under LA are especially important for patients with impaired cervical spine reclination, or those with increased risks for GA. However, GA (in our study: n = 2, 12% of patients) may be indicated as an alternative in cases of a pronounced gag reflex, or poor tolerance of awake procedures. More contraindications for IL include active upper airway infections, allergies to the injected material, potential airway obstruction, or uncontrolled coagulation disorders that increase the risk of bleeding complications.
Technical challenges in office-based transoral IL include difficulties of maintaining glottal visualization and ensuring patient cooperation. This can be overcome through thorough examination at the first patient contact during preassessment. In rare cases, a specific preparation and structured training against the gag reflex can be conducted by the speech and language pathologists, mainly consisting of muscle relaxation techniques such manual laryngeal therapy. However, the most critical skills include absolute mastering of rigid laryngostroboscopy combined with the technique to apply sufficient LA into the laryngeal region and mastering of curved surgical instruments.
Contribution of SVT to Treatment Effects
SVT is believed to complement IL by helping patients in the adaptation process after surgery with altered laryngeal structures. By avoiding harmful vocal behaviors and training vocal fold closure with minimal effort treatment outcomes may be optimized. This approach aligns with findings by Jeong et al. [12], who reported significantly better voice outcomes when IL was combined with VT compared to IL alone. Their therapy protocol, spanning 3 to 14 sessions (average 6.3), systematically employed techniques such as sustained phonation, the accent method, and resonance therapy. In contrast, the SVT program in the present study was designed to be more condensed, with shorter session durations and smaller number of sessions (average: 2). This allowed to efficiently target individual patient needs based on a comprehensive assessment. Despite the growing evidence supporting the combination of IL and VT, there remains a lack of consensus regarding standardized protocols. By proposing an approach of individually tailored VT with indirect and direct VT techniques delivered over fewer sessions and shorter duration, the present study seeks to address this gap. However, more research is needed to determine the optimal duration, frequency and content of VT, as well as to examine the extent to which SVT contributes to treatment effects.
Weaknesses of the Present Study
Due to the exploratory nature of the present study with a comparatively small sample size, there are several limitations to consider in the interpretation of the presented results. Its retrospective design may have introduced biases related to data collection. Incomplete or missing data and selection bias may have been present because our sample included only patients from one institution. Furthermore, the subjective nature of some assessments, such as perceptual voice quality analysis, can introduce measurement and observer biases. Additionally, the study was underpowered for some parameters and did not show significant results for small (Pearson correlation coefficient r = 0.1) to moderate (Pearson correlation coefficient r = 0.3) effect sizes. Owing to this limitation, it was not possible to draw definitive conclusions regarding specific outcomes, including f0 mean speaking (power: 0.14, effect size: 0.26), SPL range speaking (power: 0.19, effect size: 0.24), f0 maximum speaking (power: 0.38, effect size: 0.45), f0 maximum singing (power: 0.11, effect size: 0.33), and semitone range singing (power: 0.25, effect size: 0.45). These findings clearly indicate the need for a larger sample size to adequately assess these treatment effects.
Moreover, due to the absence of established minimal clinically important differences for changes in VRP measures, statements on whether the effects hold clinical significance must be taken with caution. Furthermore, it is not possible to distinguish between the respective treatment effects of IL and SVT; therefore, it is necessary to analyze both treatments in isolation to attribute treatment effects to the treatment type. Another limitation lies in the absence of signal typing before conducting acoustic voice analysis. Although parameters from the VRP are relatively robust, time-based measures such as jitter and indices incorporating those such as the DSI may have been affected by signal analysis inaccuracies, potentially introducing bias. Moreover, the limited applicability of time-based acoustic parameters in severely dysphonic voices, as discussed in the introduction section, highlights the need to incorporate both signal typing and spectral-based measures, such as CPP in future research. While these steps would enhance the reliability and usefulness of acoustic voice assessments, there still needs to be a consensus on how to best capture functional voice changes.
Furthermore, the effects of age and time-to-treatment on the magnitude of change in treatment outcomes were analyzed using correlation coefficients. While providing valuable insights into potential associations, these findings are limited by their inability to account for confounding variables and their lack of causal interpretation. To address these limitations, we also performed post hoc linear mixed model (LMM) analyses where age and time-to-treatment were covariates. However, no significant results could be shown in these analyses. This could be attributed to the small sample size, which may have limited the statistical power and the ability to detect stable estimates and reliable conclusions [69]. These findings highlight the need for larger cohorts to obtain robust statistical results in future research.
Clinical Application of the Present Results and Outview
Despite its limitations, the present work provides new clinical insights into the effects of treatment with IL and SVT on speaking and calling VRP characteristics, which are key characteristics of speaking in daily life. The improvements in the VRP in this study can be considered clinically relevant and meaningful to some extent, especially for the mean speaking SPL, which mirrors speaking volume as a prerequisite for unimpaired daily communication. Furthermore, the mean calling SPL, mean calling f0, maximum singing SPL, and singing voice range SPL reflect voice function principles, which indicate whether people can shout or sing at all. Additionally, further parameters, such as MPT (i.e., how long people can produce a sound), maximum speaking f0, singing semitone range, and maximum singing f0, which showed effect sizes (Pearson’s r) greater than 0.3, should be considered as specific vocal markers, despite not reaching statistical significance. Although there is no established minimal clinically important difference for parameters of the VRP, the improvement of subjective symptoms as indicated by the VHI-9i from moderate-to-mild impairment validates its clinical relevance to some extent.
So far, no previous study has examined the effects of age and time-to-treatment on functional voice treatment outcomes in patients with UVFI. As our data suggest a functional advantage of early IL combined with SVT, patients diagnosed with UVFI should be counseled on the importance of early treatment. However, further clinical studies including the effects of age and time-to-treatment are necessary to provide more data for the development of evidence-based treatment schemes to support individual clinical decisions. Nonetheless, early IL with accompanying SVT can be considered an effective treatment option, especially in patients with high demands for voice quality and endurance for work or other activities of daily living. Furthermore, by showing improvements in the perceptual voice as well as subjective and instrumental voice measures, this study suggests that a transoral IL with HA might be as efficient as a transcervical IL. This should be confirmed in a randomized controlled trial comparing both IL techniques.
While IL primarily aims to restore voice function, it also plays a significant role in improving the protective function of the vocal folds. Many patients with glottal insufficiency experience impaired cough reflexes and an increased risk of respiratory infections due to compromised protection of the deeper airways during swallowing. Although data on cough and respiratory status prior to IL were not analyzed in this study, the potential benefits of addressing these issues underscore the broader therapeutic impact of IL [70].
Based on our results and clinical experience, we conclude that in the absence of established methods to predict quick recovery from UVFI, an early office-based IL of <6 months after first diagnosis should be considered for every patient. However, the optimal timing for IL varies individually by a number of factors. For instance, patients with a poor prognosis, such as those with a transsected nerve, are typically advised to have early IL. Further factors favoring an early IL include specific demands, such as professional or social needs requiring immediate voice improvement, noncompliance with VT, palliative situations, or a high aspiration risk. Therefore, an optimal decision-making process would require a standardized comprehensive assessment and follow-up scheme after initial diagnosis in this patient group, which still needs to be established.
Conclusions
This pilot study reports significant improvements in subjective, perceptual, and instrumental acoustic measurements of dysphonia following transoral IL with HA and SVT in patients with UVFI. Improvements were observed for mean GRBAS scale G (overall hoarseness), B, R, and S, mean speaking, and calling SPL, mean calling fundamental frequency (f0), maximum singing SPL, singing SPL range, jitter (%), DSI, and VHI-9i. An improved GRBAS scale, singing VRP, jitter (%), and VHI have previously been demonstrated for transcervical IL with HA but have not yet been described after transoral IL with HA and SVT. Additionally, significant outcomes of speaking and calling VRP after IL have not previously been documented in studies using either transoral or transcervical techniques.
Moreover, this is one of the very few studies providing functional voice data derived from VRPs, showing which parameters are affected by time-to-treatment and age, which may influence prognosis. While age influenced magnitude of change in minimum speaking f0 only, the timing of treatment had a broader impact on magnitude of change in several parameters including the mean speaking voice SPL, singing voice SPL maximum and range, jitter, and the DSI. These results suggest the advantage of early interventions, particularly for professional voice users. To confirm these findings, further studies considering the time-to-treatment and age effects, with larger sample sizes and more robust statistical methods are needed. Additionally, future research should aim to assess to what extent SVT contributes to treatment effects.
Statement of Ethics
The study protocol was reviewed and approved by the Central Ethics Committee of Zurich (KEK ZH 2017-00806). All participants provided written and oral consent to participate in the study. Furthermore, all the participants agreed to publish the results obtained from the study.
Conflict of Interest Statement
The authors have no conflicts of interest to declare.
Funding Sources
The Division of Phoniatrics and Speech Pathology and the Clinic for Otorhinolaryngology, Head and Neck Surgery, University Hospital Zurich, Switzerland supported the study and provided infrastructure to S. Marti, J.E. Bohlender, and M. Brockmann-Bauser. Furthermore, University Hospital Zurich provided the salaries of J.E. Bohlender and M. Brockmann-Bauser. The funder had no role in the study design, data collection, data analysis, or reporting.
Author Contributions
All authors (S.M., J.E.B., and M.B.-B.) contributed substantially to this study. The study design and concept were developed in sessions with all authors. S.M. was responsible for creating the database and its analysis as well as the first paper draft. M.B.-B. was responsible for overseeing the data collection, analysis and writing up process. All authors redrafted and agreed on the current paper.
Data Availability Statement
The examined dataset was retrieved from the clinical database of the Div. of Phoniatrics and Speech Pathology, Clinic for Otorhinolaryngology, Head and Neck Surgery, University Hospital Zurich. Due to ethical restrictions and local legislation, such as regarding the impeded anonymization of voice recordings, data are not made publicly available but can be obtained from the corresponding author upon reasonable request and only after agreement of the responsible ethical review and data governance board.