Introduction: Voice diagnostics including voice range profile (VRP) measurement and acoustic voice analysis is essential in laryngology and phoniatrics. Due to COVID-19 pandemic, wearing of 2 or 3 filtering face piece (FFP2/3) masks is recommended when high-risk aerosol-generating procedures like singing and speaking are being performed. Goal of this study was to compare VRP parameters when performed without and with FFP2/3 masks. Further, formant analysis for sustained vowels, singer’s formant, and analysis of reading standard text samples were performed without/with FFP2/3 masks. Methods: Twenty subjects (6 males and 14 females) were enrolled in this study with an average age of 36 ± 16 years (mean ± SD). Fourteen patients were rated as euphonic/not hoarse and 6 patients as mildly hoarse. All subjects underwent the VRP measurements, vowel, and text recordings without/with FFP2/3 mask using the software DiVAS by XION medical (Berlin, Germany). Voice range of singing voice, equivalent of voice extension measure (eVEM), fundamental frequency (F0), sound pressure level (SPL) of soft speaking and shouting were calculated and analyzed. Maximum phonation time (MPT) and jitter-% were included for Dysphonia Severity Index (DSI) measurement. Analyses of singer’s formant were performed. Spectral analyses of sustained vowels /a:/, /i:/, and /u:/ (first = F1 and second = F2 formants), intensity of long-term average spectrum, and alpha-ratio were calculated using the freeware praat. Results: For all subjects, the mean values of routine voice parameters without/with mask were analyzed: no significant differences were found in results of singing voice range, eVEM, SPL, and frequency of soft speaking/shouting, except significantly lower mean SPL of shouting with FFP2/3 mask, in particular that of the female subjects (p = 0.002). Results of MPT, jitter, and DSI without/with FFP2/3 mask showed no significant differences. Further mean values analyzed without/with mask were ratio singer’s formant/loud singing, with lower ratio with FFP2/3 mask (p = 0.001), and F1 and F2 of /a:/, /i:/, /u:/, with no significant differences of the results, with the exception of F2 of /i:/ with lower value with FFP2/3 mask (p = 0.005). With the exceptions mentioned, the t test revealed no significant differences for each of the routine parameters tested in the recordings without and with wearing a FFP2/3 mask. Conclusion: It can be concluded that VRP measurements including DSI performed with FFP2/3 masks provide reliable data in clinical routine with respect to voice condition/constitution. Spectral analyses of sustained vowel, text, and singer’s formant will be affected by wearing FFP2/3 masks.
Since 2001, the basic protocol for functional assessment of voice pathology of the Committee of Phoniatrics of the European Laryngological Society has become standard for clinical routine voice diagnostics in most laryngology departments and voice centers. Its basic set of “truncus communis” for the assessment of common dysphonia considers (a) perception, (b) videostroboscopy, (c) acoustics, (d) aerodynamics/efficiency, and (e) subjective rating by patient .
The COVID-19 pandemic still continues and has a major impact on daily life and challenges clinical routine work. Airborne transmission by respiratory droplets or aerosols has been identified as the most important transmission pathway for the severe acute respiratory syndrome coronavirus (SARS-CoV-2) [2-4]. Aerosol-generating procedures, such as endoscopy of nose, mouth, or larynx and voice diagnostics, are considered as high risk of transmitting corona virus (COVID-19). Among healthcare providers, laryngologists, phoniatricians, speech language pathologists, and speech therapists are at elevated risk of COVID-19 infection due to a high potential of viral transmission by aerosols and airborne droplets produced during speaking, singing, sneezing, and coughing . Inhaling small airborne droplets is the most probable route of infection, in addition to more widely recognized transmission via larger respiratory droplets and direct contact with infected people or contaminated surfaces . The infection-control pyramid considers physical elimination of the pathogen, distancing to separate people and pathogen, administrative controls, and personal protective equipment (PPE) . Any preventive measure (such as wearing masks, ventilation, social distancing) aims at reduction of inhaled particle concentration and thus should reduce the infection probability and spreading of COVID-19. The WHO recommends wearing of protective class 2 or 3 filtering face piece (FFP2/3) masks, if high-risk aerosol-generating procedures are required and cannot be avoided [8, 9].
In 2020, the Union of the European Phoniatricians published a position paper on recommendations for phoniatricians and ENT surgeons on how to provide and/or run healthcare services for voice, swallowing, speech and language, or pediatric audiology proposing the use of PPE by both the medical staff and the patients . Asadi et al.  pointed out a higher rate of particle emission during loud vocalization, ranging from approximately 1 to 50 particles per second (with 0.06 to 3 particles per cm3) for low to high amplitudes.
Several studies revealed that singing, especially choir singing, poses a major risk during COVID-19 pandemic due to high infection rates [12-15]. Due to the high risk of person-to-person infection with SARS-CoV-2 during phonation (speaking and singing), the COVID-19 pandemic poses major challenges for healthcare workers involved with voice diagnostics. Routine voice diagnostic measurements cannot be performed by distance telehealth, as recommended by the Union of the European Phoniatricians (UEP) for speech and language assessment and therapy , because there is a lack of instrument-based standards.
Standardized office-based instrumental measurements of voice acoustics remain mandatory for correct diagnosis of dysphonia. Wearing face masks seems to affect speech intelligibility and acoustic voice measurements. In classroom communication, the use of fabric masks yielded a significantly greater reduction in speech intelligibility compared to the other masks . Several common types of face masks (N95, surgical, and cloth) have been tested for acoustic measures of timing, frequency, perturbation, power spectral density, speech intelligibility, and word and sentence accuracy by Magee et al. . The data indicated that face masks change the speech signal, but specific acoustic features remain largely unaffected (e.g., measures of voice quality) irrespective of the mask type. Maryn et al.  confirmed the influence of facial masks on speech sound properties, although the impact differed between mask types. The authors included 26 speech sound properties related to voice production (fundamental frequency, sound intensity [I] level), voice quality (jitter percent, shimmer percent, harmonics-to-noise ratio, smoothed cepstral peak prominence, Acoustic Voice Quality Index), articulation, and resonance parameters in the analysis.
So far, recent studies focus mainly on perturbation measurement, formant analysis, speech intelligibility, maximum phonation time (MPT), and spectral analysis when comparing voice production without and with wearing a facial mask. As for personal protection, it still remains unclear whether the recommended FFP2/3 masks in voice centers alter the outcome of routine voice range profile (VRP) measurements and voice acoustics in clinical diagnostics. One routine voice diagnostic measurement is the VRP, also called phonetogram [19, 20], which displays sound pressure level (SPL) versus fundamental frequency  for singing and speaking tasks. So far, it has not been investigated whether these measurements performed without and with wearing a facial mask provide similar results.
The goal of this study was the comparison of VRP measurements for speaking and singing voice without and with wearing either a FFP2/3 mask. As several commercially available VRP measurement devices offer Dysphonia Severity Index (DSI) measurements  which uses highest frequency (F0-high in Hz) and lowest intensity (in dB) from the VRP, the MPT (in second) and jitter (%) were included additionally for DSI measurement. Further, spectral analyses of sustained vowels (first and second formants/F1 and F2), analyses of singer’s formant as well as assessment of long-term average spectrum (LTAS) and alpha-ratio (α-ratio) measurement were considered in this study.
Subjects and Methods
The study has been approved by the Ethic Committee of the Medical University of Vienna (EK 1536/2021). For this study, a total of 20 subjects were enrolled at either the ENT outpatient department of the Medical University Hospital Vienna or the Medical Center of Communication Vienna. Ten out of the 20 subjects were interested internal department staffs, who were recruited by flyers. The other subjects were patients with functional voice disorders.
As a result of the pandemic-related legal requirements, patients were required to wear either FFP2 or, in individual cases, FFP3 masks, in hospitals or at private doctor visits. During the study, all subjects wore their own CE certified face mask.
All patients were tested negative for SARS-CoV-2 infection prior to the measurements, and the correct fitting of the face mask was checked by the examiner. Both speech therapists (I.K. and K.K.) involved in the measurements wore PPE including FFP3 masks during voice recordings and VRP measurements. For the voice recording, a XION headset with incorporated microphone with 30 cm mouth-microphone distance was used. All recordings were taken using the software DiVAS by XION medical (Berlin, Germany), which is the standard software for voice diagnostics at the department of phoniatrics and logopedics and Medical Center of Communication. For all subjects, the recording was performed in the same order (1) recording standard text “The North Wind and the Sun” in German, (2) recording of sustained vowels /a:/, /i:/, and /u:/, and (3) recording of singing and speaking (numbers in German) voices. The voice recordings were automatically saved as separate sound files in wave format to the computer where the software DiVAS by XION medical (Berlin, Germany) was installed. To avoid bias, the order of recording and measuring without and with wearing a mask was randomly chosen (shown in Fig. 1, 2). All recordings were performed at recording room setup for routine voice diagnostics.
All subjects underwent the VRP measurements without and with wearing their FFP2/FFP3 masks using the software DiVAS. For automated computation of DSI, MPT (in second) and jitter (in %) of a sustained /a:/ vowel were also measured. For further analysis of VRP parameters, the values of each subject without and with mask registered by DiVAS were exported to Excel table.
To measure the VRP, first, the voice ranges of soft and loud singing were recorded. All subjects had to sing over the entire vocal pitch range, at the beginning softly and then as loud as possible. For both soft and loud singing, each subject began at its middle pitch and went up to the highest possible pitch. Afterward, each subject went down to its lowest possible pitch.
To display the special energy components of the singer’s formant level, the SPL in the spectral range of 2–4 kHz was simultaneously recorded along with the loudest sung tones. Afterward, the ratio (in %) between singer’s formant in dB and loud singing in dB was calculated.
Recordings of the speaking voice from soft speaking to shouting were taken for the assessment of the fundamental frequency (F0) and SPL of the levels of increase. During the voice recordings, all subjects had to count numbers (German language from 21 upward), in the orders: soft speaking, conversational loudness, classroom loudness, and shouting voice. For the automated calculation of DSI, MPT and jitter-% at comfortable pitch and loudness were measured using the software DiVAS. For the measurement of MPT, all subjects were asked to take a deep breath and hold vowel /a:/ as long as possible. To measure jitter-%, vowel /a:/ of each subject was recorded for 3 s; afterward, 2 s of the midportion of the voice recording was marked for automated analysis by DiVAS.
Considering the logarithmic character of the frequency scale (measured in Hz) of the VRP, the use of a semitone (st) scale has been preferred with A1 = 55 Hz = st 1 (lower range limit auf VRP) and f3 = 1,319 Hz = st 57 (upper range of VRP). The F0 of soft speaking at lowest SPL and shouting voice at highest SPL as well as the frequency of soft singing at lowest SPL and loud singing at highest SPL was allocated to the appropriate st-level, and the voice range of speaking as well as singing voice, as st-range, was calculated accordingly.
For quantification of size and extent of the singing voice range, an algorithm has been developed by one of the co-authors (ML) using pixel data equivalent of the vocal extent measure (eVEM in px) by Caffier et al. . The data of each singing VRP was exported from the DiVAS application, and the SPL (dB) levels were plotted corresponding with the frequencies on the st scale as vertexes of the area representing the voice range on a defined image of 1,275 × 800 pixels. The number of pixels of the area was calculated and used for quantification of the singing voice ranges with and without FFP2 masks as an eVEM in pixel for each subject.
Additionally, the voice recordings of the sustained vowels /a:/, /i:/, and /u:/ without and with mask, which were performed using the software DiVAS, were taken for 3–4 s at comfortable pitch and loudness. Afterward, the recordings were exported and 2 s of the midportion of each vowel recording was analyzed regarding formants F1 and F2 in computer under Windows 10 environment using the freeware program praat (version 6.1.53) (https://www.fon.hum.uva.nl/praat/download_mac.html). The values of F1 and F2 of all subjects were then entered in an Excel table for further evaluation.
To assess the LTAS, the voice recordings of the standard text “The North Wind and the Sun” in German without and with mask were performed at comfortable pitch and loudness for all subjects. Later, the recordings were exported and trimmed using praat, where the unvoiced segments between each sentence were removed and then saved as new files in wave format. For analyzing the LTAS of the trimmed standard text recordings, the bandwidth of 400 Hz was used. In praat, the following steps were performed: first, under “Analyse spectrum,” the menu option “To LTAS” was selected, and then the bandwidth was set to 400 Hz. After applying the setting, the selected voice spectrum was converted to LTAS object, which was then queried to get the mean LTAS value (in dB) within the frequency range of 0–1,000 Hz and 1,000 Hz–6,000 Hz as described by Sundberg and Nordenberg. . Each value of LTAS below and above the frequency of 1,000 Hz was entered in an Excel table, and afterward, the α-ratio, which is defined as ratio between the sound energies expressed as dB above and below 1,000 Hz: α = IHF/ILF, was calculated [24, 25].
In summary, from the VRP and voice recordings, the following parameters have been considered for analysis: range of singing voice (in st-range), eVEM (in pixel), SPL of soft speaking and shouting voices (in dB), st-level above A1 = 55 Hz of soft speaking and shouting voice (st-level), MPT (s), jitter (%), DSI, vowel formants F1 and F2 (Hz) for /a:/, /i:/, and /u:/, singer’s formant, LTAS, and α-ratio. Due to differences in frequency of speaking and singing voice between male and female, the parameters were analyzed separately for each gender as well.
For statistical comparison of the data of small dependent samples (2 dependent groups), the t test was applied with a significance level of 5% (p < 0.05). All the statistical analyses were performed in computer with Windows 10 using IBM SSPS Statistics version 26, 64-bit version.
Descriptive Data of Patients
From the included 20 subjects, 6 were male and 14 were female with an average age of 36 ± 16 years (mean ± standard deviation [SD]) with age ranges from 19 years to 75 years. The 10 subjects of the department were judged with euphonic voices. The 10 patients with functional voice disorders had either none (n = 4) or mild (n = 6) degrees of hoarseness at the time of examination. To sum up, 14 were rated as euphonic and 6 as mildly hoarse.
Results of Singing Voice Analysis
The results of the analysis for singing voice are shown in Table 1, which contains the results of the average singing voice range and eVEM without and with FFP2/3 mask. Comparing the results without and with wearing a FFP2/3 mask of all subjects and of each gender separately, no significant differences were found (p < 0.05).
Results of Speaking Voice Analysis
In this study, the speaking voice without and with FFP2/3 mask was analyzed and the results are listed in Table 1. For soft speaking, the results of the mean SPL of all subjects and of each gender without and with FFP2/3 mask did not show significant differences (p < 0.05).
Comparing the results of mean SPL for the shouting voice of all subjects without and with FFP2/3 mask, it showed significantly lower values when wearing FFP2/3 mask (p = 0.005). Comparing the value of mean SPL of shouting voice separately by gender, only female subjects showed significantly lower value when wearing FFP2/3 mask (p = 0.002). Nevertheless, from the clinical point of view, it is of no importance, as both values are higher than 90 dB indicating a normal voice condition/constitution .
When comparing the values of frequency of soft speaking and shouting voice as well as voice range of speaking voice of all subjects and by gender without and with FFP2/3 mask, no significant differences were found (p < 0.05). In this study, MPT, jitter, and DSI of the 20 subjects were also measured and the results are shown in Table 1. Comparing the mean MPT, jitter, and DSI of all subjects and by gender without and with FFP2/3 mask, it did not show significant differences of the values of these three measurements (p < 0.05).
Results of Singer’s Formant and Vowel Formants
Within the VRP measurement, the singer’s formant had been evaluated. As shown in Table 1 comparing the average ratio between singer’s formant and loud singing of all subjects without and with FFP2/3 mask, significant difference was found, with lower ratio with FFP2/3 mask (p = 0.001). When comparing the results without and with FFP2/3 mask by gender, significant difference was applied only to female subjects with lower ratio with FFP2/3 mask (p = 0.004).
The results of the analysis of formants F1 and F2 of the vowels /a:/, /i:/, /u:/ without and with FFP2/3 mask are shown in Table 1. Comparing the results of mean frequency of F1 of all subjects and of each gender without and with FFP2/3 mask, no significant differences were found (p < 0.05). The assessment of the mean frequency of F2 of vowels /a:/ and /u:/ did not show significant differences when compared without and with FFP2/3 mask (p < 0.05). Significant difference in mean frequency was only found in F2 of vowel /i:/ of all subjects with lower value with FFP2/3 mask (p = 0.005). Comparing the mean frequency of F2 of vowel /i:/ without and with FFP2/3 mask by gender, significant difference was only observed in female subjects with lower value with FFP2/3 mask (p = 0.027). The data of F1 and F2 for /a:/, /i:/, /u:/ without and with FFP2/3 mask are illustrated in Figure 3.
Results of LTAS of the Voice of Reading Text
The results of the analysis of LTAS of the sound file of the reading text “The North Wind and the Sun” are listed in Table 1. Comparing the mean value of LTAS of all subjects from the frequency range 0–1,000 Hz of all subjects without and with FFP2/3 mask, it showed a significantly higher value with FFP2/3 mask (p = 0.046). When comparing the values separately by gender, higher LTAS values were observed in both male and female subjects with FFP2/3 mask and that of the male subjects were on the verge of significance (p = 0.056).
When comparing the mean value of LTAS from frequency range 1,000–6,000 Hz of all subjects and by gender without and with FFP2/3 mask, it showed significant difference in terms of decrease of mean value with FFP2/3 mask (p = 0.000). After assessment of the α measurement, it showed significant higher mean α-ratio of all subjects with FFP2/3 mask (p = 0.003). When comparing the mean α-ratio by gender, only that of the female subjects was significantly higher with FFP2/3 mask (p = 0.019).
In anticipation of a possible next wave of the COVID-19 pandemic, this study should investigate the VRP measures and acoustic voice measurements as main methods of routine voice diagnostics performed without wearing a FFP2/3 mask compared to those with wearing it. Under the circumstances of the COVID-19 pandemic and facing pandemic-related legal infection-control directives such as social distancing and the necessity of PPE, wearing of face masks has become globally usual and omnipresent in hospitals. PPEs are able to limit transmission of virus between healthcare professionals and patients in healthcare settings and are recommended as key control measures in infection prevention and viral spread .
However, does covering of nose and mouth with FFP2/3 face mask have impact on daily routine voice diagnostics? In this study, voice range of singing and speaking voice measured by VRP did not show significant differences between measurements without and with FFP2/3 mask. Further, no significant differences were observed in parameters SPL and F0 of soft speaking when measured without and with FFP2/3 mask. Neither did the routine diagnostics parameters DSI and its components reveal significant differences between measurement conditions without and with wearing FFP2/3 masks. The only exception was the SPL of shouting voice, which was significantly lower with mask in female subjects, and due to this, it had statistical impact in all subjects. It can hardly be explained by the higher F0 in women than in male subjects (about 1 octave higher), but with higher F0 and wider differences between the harmonics, the FFP2/3 masks might impact the female voice more than the male one. Nevertheless, both values of SPL of shouting voices without and with FFP2/3 mask were higher than 90 dB, meaning normal vocal constitution/condition.
The assessment of sustained vowels /a:/, /i:/, and /u:/ without and with mask regarding F1 and F2 did not show significant differences either, with the exception of F2 of the vowel /i:/ in female subjects (Table 1). Since F2 is highest in vowel /i:/, and that of the female subjects is higher, it was therefore significantly affected when wearing FFP2/3 mask. This in turn confirmed the attenuating effect of mask for higher spectral energy. The analysis of the eVEM without and with mask did not show significant differences. Significant differences were only found in ratio between singer’s formant, a prominent spectrum envelope peak near 3 kHz in sung voice , and SPL of loud singing without and with mask. The significantly lower ratio between singer’s formant and SPL of loud singing of subjects with mask was mainly due to the attenuating effect of FFP2/3 mask on voice signals above 1 kHz, since masks had limited influence on voice signals below 1 kHz . The attenuation effect of mask in high frequency (HF) range was obvious, when differentiating by gender, the significantly lower ratio was only seen in female subjects and not in male subjects.
Assessment of the speaking voice of the reading text “The North Wind and the Sun” showed significant attenuation effect of FFP2/3 mask primarily in HF range. Since the LTAS is affected by vocal loudness  and due to the attenuating effect of mask in HF range, as seen in values of LTAS, which are presented in Table 1, the I in HF range was significantly lower with FFP2/3 mask compared to that without FFP2/3 mask, whereas the I in lower frequency range was on the verge of significance. When assessing the LTAS separately by gender, no significant differences were found.
With mask, the significantly lower I of LTAS in HF range and slightly higher I in low frequency (LF) range in turn resulted in higher α-ratio compared to that without mask. The significant differences in α-ratio were only seen in female subjects. Thus, the outcome of the assessment of LTAS and α-ratio complies with the outcome of ratio of singer’s formant/SPL loud singing and reconfirmed the attenuating effect of FFP2/3 mask in HF range. Therefore, the damping effect of FFP2/3 mask explained why subjects wearing FFP2/3 mask have perceived poorer speech intelligibility.
As shown in this study, it can be concluded that wearing FFP2/3 mask did not affect the outcome of the routine voice diagnostics and reliable data regarding VRP and sustained vowels can still be assessed wearing a FFP2/3 mask. Further, in this study, it showed that masks did not affect voice constitution (SPL with mask >90 dB)  and frequency range of voice diagnostics. However, due to the attenuating effect of FFP2/3 masks in HF range, when performing voice diagnostic of vowels with mask, it is recommended to focus on vowels /a:/ and /u:/, and for performing voice diagnostic of singing voice, the singer’s formant should not be taken into consideration either. Summing up, since FFP2/3 masks did barely impact on routine voice diagnostic results, their use by the patients during measurement can be recommended in accordance with public infection-control directives in order to minimize the spread of the COVID-19 infection.
In this study, wearing of FFP2/3 face masks did not affect the outcome and interpretation of routine VRP measurements. The impact of FFP2/3 on spectral analysis of sustained vowels, text, and singer’s formant needs to be considered in clinical voice diagnostics. With regard to support public health and infection control during the COVID-19 pandemic, patients should be instructed to use FFP2/3 masks during examination.
Statement of Ethics
This study protocol was reviewed and approved by the Ethics Committee of the Medical University of Vienna, approval number 1536/2021. All procedures performed in the studies comply with the guidelines for human studies, and the research was conducted ethically in accordance with the World Medical Association Declaration of Helsinki. Informed consent was obtained from all participants.
Conflict of Interest Statement
The authors have no conflicts of interest to declare.
No funding was required for this study.
Guan-Yuh Ho: data analysis, literature research, and preparation of the manuscript. Ines Kristina Kansy and Katharina Anna Klavacs: recruitment of patients, voice recordings, and VRP measurements. Matthias Leonhard: statistical analysis, eVEM, and manuscript proofreading. Berit Schneider-Stickler: literature research, study design, preparation of the manuscript, and study supervision.
Data Availability Statement
All data generated or analyzed during this study are included in this article. Further inquiries can be directed to the corresponding author.