Abstract
Introduction: The Toolkit to Examine Lifelike Language (TELL) is a web-based application providing speech biomarkers of neurodegeneration. After deployment of TELL v.1.0 in over 20 sites, we now introduce TELL v.2.0. Methods: First, we describe the app’s usability features, including functions for collecting and processing data onsite, offline, and via videoconference. Second, we summarize its clinical survey, tapping on relevant habits (e.g., smoking, sleep) alongside linguistic predictors of performance (language history, use, proficiency, and difficulties). Third, we detail TELL’s speech-based assessments, each combining strategic tasks and features capturing diagnostically relevant domains (motor function, semantic memory, episodic memory, and emotional processing). Fourth, we specify the app’s new data analysis, visualization, and download options. Finally, we list core challenges and opportunities for development. Results: Overall, TELL v.2.0 offers scalable, objective, and multidimensional insights for the field. Conclusion: Through its technical and scientific breakthroughs, this tool can enhance disease detection, phenotyping, and monitoring.
Plain Language Summary
Neurodegenerative disorders (NDs), such as Alzheimer’s and Parkinson’s disease, are a leading cause of disability, caregiver stress, and financial strain worldwide. The number of cases, now estimated at 60 million, will triple by 2050. Early detection is crucial to improve treatments, management, and financial planning. Unfortunately, standard diagnostic methods are costly, stressful, and often hard to access due to scheduling delays and availability issues. A promising alternative consists in digital speech analysis. This affordable, noninvasive approach can identify NDs based on individuals’ voice recordings and their transcriptions. In 2023, we launched the Toolkit to Examine Lifelike Language (TELL), an online app providing robust speech biomarkers for clinical and research purposes. This paper introduces TELL v.2.0, a novel version with improved data collection, encryption, processing, storing, download, and visualization features. First, we explain the app’s basic operations and its possibilities for online and offline data collection. Second, we describe its language survey, which covers questions about demographics as well as language history, usage, competence, and difficulties. Third, we describe TELL’s speech tests, which assess key clinical features. Fourth, we outline the app’s functions for analyzing, visualizing, and downloading data. We finish by discussing the main challenges and future opportunities for TELL and the speech biomarker field. With this effort, we hope to boost the use of digital speech markers in medical and research fields.
Introduction
Neurodegenerative disorders (NDs) are major causes of disability [1‒3], caregiver burden [4], and financial challenges [5, 6] worldwide. The number of cases, estimated at 60 million, will triple by 2050 [7, 8]. Early detection is vital in this context, as it can increase planning time for neuroprotective changes [9‒11], optimize pathology-targeted therapies [12], and reduce costs by fostering routine over emergency care [13]. Worryingly, however, standard diagnostic tools (clinical tests, brain scans, biofluid markers) are expensive, stressful, subject to scheduling delays, and unavailable in many cities [1, 14‒16]. A pressing quest has thus begun for affordable, patient-friendly, immediate, scalable markers that facilitate ND detection and monitoring [8].
Such markers can be captured via digital speech analysis, which derives disease-sensitive features from acoustic and linguistic patterns in voice recordings and their transcriptions, respectively [17‒19]. This procedure is cost-effective (reducing the need for expert staff), noninvasive (avoiding discomfort and risks), time-efficient (offering on-the-fly results), and remotely applicable (overcoming geographical barriers). Thus, digital speech biomarkers represent a powerful framework for clinical ND assessments [17, 18, 20].
This potential is being harnessed by speech biomarker apps. In 2023, building on years of research [21‒32], we released the first version of the Toolkit to Examine Lifelike Language (TELL), a multilingual web-based speech app for academic and clinical purposes [33]. Since then, our tool has been deployed in over 20 sites across 14 countries, generating valuable data and findings as well as new needs and opportunities for innovation. As a result, we have developed TELL v.2.0, a state-of-the-art platform to collect, encrypt, process, store, download, and visualize speech and language data. Here, we introduce this updated version, describing its core features for clinical and research. We further discuss the core challenges ahead and conclude by calling for more translational initiatives in the digital biomarker space.
Using TELL v.2.0
TELL v.2.0 runs on any device, with Chrome and Safari browsers being recommended for laptops and iOS devices, respectively. Back-end and front-end details are offered in online supplementary material 1 (for all online suppl. material, see https://doi.org/10.1159/000541581). The main differences between TELL v.2.0 and its predecessor are listed in Table 1.
Comparing TELL v.1.0 and TELL v.2.0
Aspect . | TELL v.1.0 . | TELL v.2.0 . |
---|---|---|
User roles | Examiner | Superuser, examiner, tester |
Acquisition modes | In-person online | In-person online, remote online, in-person offline, external data |
Compliance | HIPAA-compliant | HIPAA- and GDPR-compliant |
Languages | English, Spanish, French, Portuguese | English, Spanish, French, Portuguese, German, Italian, Quechua, Kiswahili, Tagalog |
Language profile survey | Health-related habits and language history, use, proficiency, and difficulties | Same as v.1.0, with added chronotype questions |
Preprocessing steps | Common channel normalization, loudness normalization, denoising, voice activity detection | Same as v.1.0, with added diarization module |
Metrics | Eight, applied generally across tasks | Over 130, with critical ones organized into four domains (motor speech, semantic memory, episodic memory, emotional processing) |
Interpretability | Similar visualizations for all metrics, including comparisons between participant and benchmark data | Specific visualizations depending on the metric, including comparisons between participant and benchmark data and machine learning classifiers |
Aspect . | TELL v.1.0 . | TELL v.2.0 . |
---|---|---|
User roles | Examiner | Superuser, examiner, tester |
Acquisition modes | In-person online | In-person online, remote online, in-person offline, external data |
Compliance | HIPAA-compliant | HIPAA- and GDPR-compliant |
Languages | English, Spanish, French, Portuguese | English, Spanish, French, Portuguese, German, Italian, Quechua, Kiswahili, Tagalog |
Language profile survey | Health-related habits and language history, use, proficiency, and difficulties | Same as v.1.0, with added chronotype questions |
Preprocessing steps | Common channel normalization, loudness normalization, denoising, voice activity detection | Same as v.1.0, with added diarization module |
Metrics | Eight, applied generally across tasks | Over 130, with critical ones organized into four domains (motor speech, semantic memory, episodic memory, emotional processing) |
Interpretability | Similar visualizations for all metrics, including comparisons between participant and benchmark data | Specific visualizations depending on the metric, including comparisons between participant and benchmark data and machine learning classifiers |
Account Request and Access
Accounts can be requested via a contact form (https://tellapp.org/contact/). The app is accessed through a button on TELL’s website’s homepage (https://tellapp.org/), prompting users to enter their usernames and passwords.
User Roles
TELL v.2.0 accommodates three user roles. The superuser role grants access to all information collected by examiners from a given working group, while blocking data collection functions. The examiner role has access to data collection functions and to previous data collected by the same examiner. The tester role has no access to previous data and can only collect data from previously registered participants.
App Access and Participant Registration
Once logged on, examiners can register participants by entering an identification code alongside optional demographic and clinical data. Relevant files can be uploaded if needed. Each participant is visible and searchable in a registry menu showing its code and two icons to accessing a language profile survey and a speech recording protocol.
Data Acquisition
Acquisition Modes
TELL v.2.0 features four data acquisition modes (Fig. 1). The in-person online mode is used for face-to-face testing with internet access. The remote online mode connects TELL to Zoom to acquire and process audio from the videocall (Zoom’s default screenshare function allows examinees to view TELL’s stimuli when needed). The in-person offline mode enables data collection in settings without connectivity (online suppl. material 2). Finally, the external data mode allows users to upload audios recorded elsewhere (individually or in bulk) and feed them to TELL’s standard processing pipelines.
TELL v.2.0’s data acquisition modes. a The in-person online mode requires internet access for face-to-face testing and real-time results. b The remote online mode also requires internet access for TELL to connect to Zoom and capture speech data from the participant’s computer. c The in-person offline mode is used in settings lacking internet access, so that data can then be processed when the device regains connectivity. d The external data mode allows users to upload audios recorded elsewhere and leverage TELL’s preprocessing and analytical capabilities.
TELL v.2.0’s data acquisition modes. a The in-person online mode requires internet access for face-to-face testing and real-time results. b The remote online mode also requires internet access for TELL to connect to Zoom and capture speech data from the participant’s computer. c The in-person offline mode is used in settings lacking internet access, so that data can then be processed when the device regains connectivity. d The external data mode allows users to upload audios recorded elsewhere and leverage TELL’s preprocessing and analytical capabilities.
In the first three data acquisition modes, upon accessing the speech recording protocol, users must read reported data collection guidelines [34], choose the sound source, and grant permission for TELL to access it. Compatible input sources include the device’s built-in microphone, an external microphone, or Zoom’s audio input. Recordings are saved as lossless .WebM files (sampling rate: 44.1 Hz; bit-depth: 24 bits), temporarily moved to virtual memory, then stored in the back-end as highly durable objects in encrypted S3 buckets, and finally deleted from the device. S3 buckets are periodically backed up to ensure data recoverability.
Language Profile Survey
TELL v.2.0’s two-part survey begins with sociodemographic and health-related questions, including items on speech modulators (e.g., smoking, sleeping habits). The second part is an extended adaptation of the Bilingual Language Profile questionnaire [35, 36], tapping on language history, use, proficiency, and difficulties. Responses can be downloaded on .csv files. The survey is available in English, Spanish, Portuguese, and French, and it can be completed in roughly 5 min.
Speech-Based Assessments
TELL v.2.0 offers four main assessment modules (tapping on motor speech, semantic memory, episodic memory, and emotional processing), alongside additional tasks and metrics (Fig. 2). To different degrees, these domains are vulnerable to Alzheimer’s disease (AD), Parkinson’s disease (PD), and frontotemporal dementia (FTD) variants, among other NDs [21‒32].1 [37, 38, 39, 40], Tasks consist of a title, instructions, a stimulus (when required), and a “record” button. The latter is used to start a recording, finish it at any time, and launch the following task. Tasks are available in English, Spanish, Portuguese, French, and Swahili – more languages can be added on demand. Users can choose to administer one or multiple modules. Importantly, note that most of the findings below stem from data collected and/or analyzed through TELL [28, 31, 32, 41].
TELL v.2.0’s main assessment modules. Motor speech is analyzed through speech timing, articulation, and voice quality features derived from syllable repetition and paragraph reading. Semantic memory is examined via speech timing and word property features derived from semantic and phonemic fluency tasks. Episodic memory is assessed through speech timing alongside first-person, third-person, noun, and verb ratios from routine descriptions. Emotional processing is characterized by measuring affective pitch, emotional valence, emotional weight during pleasant memory description, and emotional video narration. Additional tasks and metrics are offered for specific research purposes.
TELL v.2.0’s main assessment modules. Motor speech is analyzed through speech timing, articulation, and voice quality features derived from syllable repetition and paragraph reading. Semantic memory is examined via speech timing and word property features derived from semantic and phonemic fluency tasks. Episodic memory is assessed through speech timing alongside first-person, third-person, noun, and verb ratios from routine descriptions. Emotional processing is characterized by measuring affective pitch, emotional valence, emotional weight during pleasant memory description, and emotional video narration. Additional tasks and metrics are offered for specific research purposes.
For recording, we recommend using laptops, iPads or large-screen iPhones. The device should be connected to an external microphone. Users are encouraged to use soundproof booths for ideal acoustic insulation [42]. Yet, since these are rarely available in clinical institutions, TELL offers on-screen guidelines to optimize and harmonize acoustic conditions and data quality across acquisition settings. These include sitting in a quiet room, muting phones, removing noise sources, and speaking close to the microphone at a constant distance without touching it [33, 34]. Moreover, users are advised to avoid rooms that are large and empty (as these may cause unwanted reverberation) or that contain electrical medical devices (as they may interfere with the audio signal) [42]. A solid wheel-less chair is recommended to prevent movement-related noise [42, 43]. Also, our protocols instruct examiners not to speak after delivering instructions and to use gestures to communicate with participants once a task has commenced. Importantly, TELL further favors data consistency across participants and settings thanks to its five-step automated preprocessing pipeline (Supplementary material 3).
Motor Speech Assessment. Motor speech encompasses neuromuscular mechanisms involved in planning, coordinating, and executing speech production movements [44]. Such processes engage specific neural regions (e.g., motor cortices, basal ganglia) and peripheral organs (e.g., vocal folds, tongue) [45]. TELL v.2.0 taps on this domain via (a) a syllable repetition task (repeating/pataka/until breath runs out), which illuminates phonatory control under negligible linguistic demands; and (b) a paragraph reading task, which affords insights on naturalistic speech production while keeping linguistic (e.g., syntactic) demands constant across participants [33]. Three feature sets are derived therefrom.
First, speech timing features index articulator efficiency and speech rhythm. These are obtained through a PRAAT script [46] that calculates an audio’s number of syllables (peaks in intensity before and after intensity dips), number of pauses (silent intervals), syllable duration (mean speech time over number of syllables), pause duration (mean duration of pauses), and articulation rate (number of syllables over speech time minus pauses). Such features are sensitive to diverse NDs [23, 26] – for instance, they can identify non-fluent variant primary progressive aphasia with an AUC of 0.95 while predicting patients’ brain atrophy patterns and autopsy-confirmed pathology [26]. Second, articulation features (Fig. 3) measure energy content in the transitions between voiced and unvoiced segments, and vice versa, capturing abnormal articulatory movement [47]. Segments are deemed voiced or unvoiced if they present or lack F0 values, respectively. The energy surrounding these transitions is characterized by Bark band energies, gammatone cepstral coefficients, and Mel-frequency cepstral coefficients. These features, indeed, contribute to identifying PD with 84% accuracy, while detecting specific disease phenotypes with similar accuracy levels −80.2% in the case of PD without mild cognitive impairment [23]. Third, voice quality features index vocal tract control efficiency [48]. These encompass harmonic-to-noise ratio (the relationship between the signal’s periodic and nonperiodic components) alongside jitter and shimmer (the temporal and amplitude variation between successive F0 periods, respectively). These features are affected, for example, in AD [49].
Illustrative results based on motor speech features. a Articulation skills are established by measuring energy content in transitions between voiced (onset) and unvoiced (offset) segments. b These features have proven sensitive to PD, robustly discriminating between patients and healthy individuals based on brief audio samples (AUC = 0.94). Reproduced with permission from García et al. [23]. AUC, area under the ROC curve; HC, healthy controls; PD, Parkinson’s disease.
Illustrative results based on motor speech features. a Articulation skills are established by measuring energy content in transitions between voiced (onset) and unvoiced (offset) segments. b These features have proven sensitive to PD, robustly discriminating between patients and healthy individuals based on brief audio samples (AUC = 0.94). Reproduced with permission from García et al. [23]. AUC, area under the ROC curve; HC, healthy controls; PD, Parkinson’s disease.
Semantic Memory Assessment. Semantic memory is the neurocognitive system underlying conceptual and lexical knowledge, supporting our capacity to name, associate, and understand entities and events [50, 51]. It engages a distributed neural network, with key hubs in anterior temporal and other temporal and parietal regions [52, 53]. TELL v.2.0 taps on this domain by analyzing responses to semantic and phonemic fluency tasks, in which participants have 1 min to produce as many words as possible that denote animals or begin with a given letter, respectively [31]. Specifically, we capture six word properties and five speech timing features indexing semantic memory navigation.
To compute semantic granularity [28], Python’s NLTK library is used to access WordNet, a hierarchical graph of nodes leading from the top node “entity” to progressively more specific concepts (e.g., “animal,” “dog,” “bulldog”). Each word’s granularity is defined as the distance between its node and “entity,” yielding n distance bins (e.g., bin-3 words are closer to “entity” than bin-10 words, the former indicating less precise concepts) [28]. Also, semantic variability [28] is analyzed with a FastText model pretrained with language-specific corpora. Each text is mapped as a series of vectors, keeping the words’ sequence and omitting repetitions. Distances between adjacent vectors are stored into a time series. Semantic variability is computed as the variance of the text’s joint time series (the higher it is, the less consistent the switches between conceptual categories). We further consider each word’s frequency (logarithmic frequency per million), phonological neighborhood (number of phonologically similar words), length (number of phonemes), and imageability (rated on a 7-point scale). Moreover, we extract the speech timing features described in section 2.4.3.1. These metrics are robust markers of AD and PD, with cognitive decline being signaled by a preference for highly accessible items – e.g., frequent, unspecific, highly imageable words (Fig. 4) – and greater retrieval effort – indicated, for example, by longer pauses [31, 32, 54]. TELL’s word property features can robustly identify AD (AUC = 0.89), with word frequency predicting cognitive symptom severity (R = 0.47, p < 0.001) and syndrome-differential patterns of frontotemporal brain volume and fMRI connectivity along the default mode and the salience network [31]. Likewise, we have shown that both PD and behavioral variant FTD patients produce more semantically less variable words (p < 0.05), a pattern that correlates with disinhibition symptoms: R = −0.30, p = 0.04) and salience network hypoconnectivity (p = 0.001) [32].
Illustrative results based on semantic memory features. a Word property features, analyzed by logistic regressions, robustly discriminated Alzheimer’s disease (AD) patients from healthy individuals (AUC = 0.89), while proving uninformative to detect behavioral variant FTD (AUC = 0.62). In AD patients, word frequency in the semantic fluency task predicted cortical thickness of the left superior temporal pole as well as the right supramarginal and middle cingulate gyri (b), and connectivity along the default-mode and salience networks (c). Authorized reproduction under the terms of the Creative Commons Attribution License, from Ferrante et al. [31]. AD, Alzheimer’s disease; bvFTD, behavioral variant frontotemporal dementia.
Illustrative results based on semantic memory features. a Word property features, analyzed by logistic regressions, robustly discriminated Alzheimer’s disease (AD) patients from healthy individuals (AUC = 0.89), while proving uninformative to detect behavioral variant FTD (AUC = 0.62). In AD patients, word frequency in the semantic fluency task predicted cortical thickness of the left superior temporal pole as well as the right supramarginal and middle cingulate gyri (b), and connectivity along the default-mode and salience networks (c). Authorized reproduction under the terms of the Creative Commons Attribution License, from Ferrante et al. [31]. AD, Alzheimer’s disease; bvFTD, behavioral variant frontotemporal dementia.
Episodic Memory Assessment. Episodic memory involves retrieving daily personal experiences, a function mainly subserved by middle temporal and hippocampal brain regions [55]. TELL v.2.0 delves into this domain through a routine description task, requiring participants to narrate a typical day in their lives. In addition to speech timing features (section 2.4.3.1), we derive two sets of metrics based on part-of-speech tagging.
First, through FreeLing’s morphological tagger [56], we compute the proportion of first- and third-person references by establishing the grammatical person of person-marking words (verbs, pronouns) and calculating the ratio of first- and third-person morphemes over the sum of such words. This builds on the observation of behavioral variant FTD, for instance, often exhibit reduced self-awareness [57, 58], favoring exocentric (third-person) perspectives of the events they partake in [59]. Second, we establish the proportion of nouns and verbs (relative to all content words), given that these may be differentially impacted by posterior and frontal brain damage, respectively [60]. Indeed, specific difficulties with noun processing have been reported in AD [61‒63] and semantic dementia [64], whereas verb (and, particularly, action verb) processing is typically affected in PD [65]. More particularly, TELL v.2.0’s metrics have revealed a selective reduction of noun ratio in AD and a selective hyper-reliance on third-person references in behavioral variant FTD – both patterns being uncorrelated with cognitive dysfunction [41] (Fig. 5).
Illustrative results based on episodic memory features. a First- and third-person references are significantly abnormal in bvFTD (but not in AD) patients, pointing to a differential loss of self-awareness during personal events. b Noun ratio is selective lower for AD (but not bvFTD) patients relative to healthy individuals, revealing difficulties with accessing words that denote entities involved in daily scenarios. Authorized reproduction under the terms of the Creative Commons Attribution License, from Lopes da Cunha et al. [41]. AD, Alzheimer’s disease; bvFTD, behavioral variant frontotemporal dementia.
Illustrative results based on episodic memory features. a First- and third-person references are significantly abnormal in bvFTD (but not in AD) patients, pointing to a differential loss of self-awareness during personal events. b Noun ratio is selective lower for AD (but not bvFTD) patients relative to healthy individuals, revealing difficulties with accessing words that denote entities involved in daily scenarios. Authorized reproduction under the terms of the Creative Commons Attribution License, from Lopes da Cunha et al. [41]. AD, Alzheimer’s disease; bvFTD, behavioral variant frontotemporal dementia.
Emotional Processing Assessment. Emotional processing entails the capacity to perceive, understand, and react to affectively significant cues, such as others’ intentions and actions [66]. Relevant processes are substantially compromised in certain NDs, such as behavioral variant FTD [67‒69], and less markedly in others, such as AD [70, 71]. TELL v.2.0 assesses this domain focusing on pleasant memory descriptions and video narration (requiring verbal recount of an emotion-laden video). Relevant acoustic and linguistic metrics are derived therefrom.
To capture affective pitch patterns (Fig. 6), we focus on the F0 contour (slope of the line interpolating F0 values for a speech segment) and energy contour (slope of the curve for voiced sounds in key frequency bands) [23]. These features, along with their derivatives, are highly correlated with emotions [72, 73]. For example, high pitch is typical at speech onset for disgust and in utterance-ending segments for fear and joy [74]. Also, sadness involves a high proportion of high accents, while joy, disgust, and fear are typified by low-to-high accent patterns [74]. Second, to measure linguistic emotionality, we use PySentimiento [75], a multilingual Python toolkit based on RoBERTa [76], trained for different emotional assessment tasks [75]. We quantify each transcription’s overall emotional valence based on the proportion of emotional positivity, negativity, and neutrality [75]. Also, we estimate the weight of six basic emotions (joy, sadness, anger, fear, disgust, surprise) relative to their aggregate emotionality [77]. Disruptions of specific emotions are well-established markers of diverse NDs [78‒81].
Illustrative results based on emotional processing features. Measures of F0, an index of affective pitch, reveal abnormal range in bvFTD patients relative to healthy controls. Reproduced with permission from Nevler et al. Automatic measurement of prosody in behavioral variant FTD. Neurology. 2017;89(7): 650–56.
Illustrative results based on emotional processing features. Measures of F0, an index of affective pitch, reveal abnormal range in bvFTD patients relative to healthy controls. Reproduced with permission from Nevler et al. Automatic measurement of prosody in behavioral variant FTD. Neurology. 2017;89(7): 650–56.
Additional Tasks and Metrics. TELL v.2.0 also inherits useful tasks from its predecessor, such as picture description and sustained vowel production [33]. It further offers nearly 100 other metrics for research purposes, including counts and ratios for all word classes as well as additional sentiment analysis and pitch-related features – including distributional statistics (mean, standard deviation, median, minimum, maximum, skewness, kurtosis) when applicable.
Data Analysis, Visualization, and Download
Based on ongoing updates, the above acoustic and linguistic features can be (i) compared with benchmark data, (ii) subjected to classification analyses, and (iii) downloaded for offline processing. First, benchmark plots contextualize a participant’s values relative to healthy and patient samples (Fig. 7). Previous data from such groups are presented in intuitive graphs showing minimal, maximal, and mean values. The participant’s value is plotted over both distributions, offering phenotypic information that informs clinical decision-making.
TELL v.2.0’s visualization interface. a Speech timing data (e.g., speech rate) are represented as speedometers. b Voice quality data (e.g., harmonic-to-noise ratio), and other metrics capturing within-subject variance, are plotted as horizontal bars. c Word property data (e.g., granularity) are shown in vertical bars. d Affective data (e.g., emotional weight) are depicted as stacked bar charts. All plots show normative value ranges for healthy individuals and specific patient groups, with the examinee’s value juxtaposed to both ranges for intuitive reference from clinicians or researchers.
TELL v.2.0’s visualization interface. a Speech timing data (e.g., speech rate) are represented as speedometers. b Voice quality data (e.g., harmonic-to-noise ratio), and other metrics capturing within-subject variance, are plotted as horizontal bars. c Word property data (e.g., granularity) are shown in vertical bars. d Affective data (e.g., emotional weight) are depicted as stacked bar charts. All plots show normative value ranges for healthy individuals and specific patient groups, with the examinee’s value juxtaposed to both ranges for intuitive reference from clinicians or researchers.
Second, pretrained classifiers establish a participant’s performance on an assessment as more or less patient-like. For example, the motor speech assessment is connected to a classifier trained with all relevant features from healthy controls and different patient groups (e.g., PD patients). After applying validated normalization [82] and imputation [31] steps, the participant’s speech timing, articulation, and voice quality features are fed to the classifier. This, in turn, yields the probability that such data corresponds with speech patterns observed in our normative PD sample, with outcomes binned into five discrete categories: highly patient-like (80–100%), moderately patient-like (60–79%), indeterminate (40–59%), moderately healthy-like (20–39%), and highly healthy-like (0–19%). These scores can further support clinical characterization, phenotyping, and monitoring.
Third, features can be downloaded from single participants or participants groups in a particular time point or across sessions. Upon removal of identifiable information, data are downloaded as .csv files, with each feature representing a column – including the features from all main assessment modules and from TELL’s additional tasks and metrics. Survey results (and audio files, for users with the necessary role) are downloaded in the same way. Files are anonymized and named following a standard format to simplify organization and retrieval. Overall, this function facilitates offline data analysis and sharing for research-oriented users.
Challenges and Future Perspectives
Several challenges and opportunities lie ahead. The speech biomarker arena, at large, requires more validation of its metrics to distill the truly robust from the occasionally useful. Also, awareness-raising activities must be pursued so that clinicians, patients, and caregivers become familiar with this approach’s potential. This should be accompanied by systematic institutional efforts to encourage widespread adoption of relevant tools in testing settings. As regards TELL v.2.0, alternatives should be pursued to connect its back-end with large language models, interface with relevant information sources, and produce personalized participant reports. Furthermore, a key goal for our app is to become the go-to system for original, hypothesis-driven, disorder-specific markers (as opposed to stock metrics devised for other means), so as to better cater for clinicians’ daily needs. A relevant milestone is to develop a packaged desktop version that complements our web-based services with a downloadable app, increasing versatility. In addition, as recently advocated [18], data should be systematically gathered from underrepresented language groups. In this sense, TELL’s current protocols for the Kenyan, Peruvian, and Filipino communities (in Kiswahili, Quechua, and Tagalog, respectively) open unprecedented possibilities to favor equity in the field. Finally, speech markers should be more systematically captured across the lifespan. Most of TELL’s current protocols target individuals between 40 and 80 years old, but their tasks and metrics are informative across multiple age groups (provided that participants can speak and/or read). Indeed, building on recent speech marker research [83], TELL is now being used with teenagers and children, through protocols that involve tailored instructions and stimuli. Further efforts should be made in this direction. Relatedly, too, longitudinal studies are required to validate the consistency of TELL’s metrics across time points.
Conclusions
TELL v.2.0 is a cutting-edge speech biomarker tool to optimize ND detection and monitoring. This new incarnation shows the value of science- and feedback-driven updates to meet both clinical and research needs. Further harnessing of smart technologies, both within and beyond the speech biomarker space, is fundamental to provide equitable and scalable solutions for the growing challenges of NDs.
Statement of Ethics
An ethics statement was not required for this study type since no human or animal subjects or materials were used.
Conflict of Interest Statement
Adolfo M. García, Joaquín Ponferrada, Alejandro Sosa Welford, Cecilia Calcaterra, Mariano Javier Cerrutti, Fernando Johann, and Eugenia Hesse have received financial support from TELL SA. Raúl Echegoyen is a consultant for TELL SA. Franco Ferrante, Gonzalo Pérez, Nicolás Pelella, Matías Caccia, Laouen Belloli, Catalina González, and Facundo Carrillo declare that they have no financial interest.
Funding Sources
Adolfo M. García is partially supported with funding from the National Institute on Aging of the National Institutes of Health (R01AG075775, R01AG083799, 2P01AG019724-21A1); ANID (FONDECYT Regular 1210176, 1210195); ANII (EI-X-2023-1-176993); Agencia Nacional de Promoción Científica y Tecnológica (01-PICTE-2022-05-00103); and Programa Interdisciplinario de Investigación Experimental en Comunicación y Cognición (PIIECC), Facultad de Humanidades, USACH. The contents of this publication are solely the responsibility of the authors and do not represent the official views of these institutions. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Author Contributions
Adolfo M. García: conception, organization, figure design, and writing of the first draft; Franco Ferrante, Gonzalo Pérez, Fernando Johann, and Joaquín Ponferrada: writing of the first draft, review and critique; Alejandro Sosa Welford, Matías Caccia, Mariano Cerrutti, and Nicolás Pelella: data analysis, writing of the first draft, and review and critique; Laouen Belloli, Catalina González, Raúl Echegoyen, Cecilia Calcaterra, Eugenia Hesse and Facundo Carrillo: review and critique.
Footnotes
Beyond our focus on Alzheimer’s, Parkinson’s, and frontotemporal lobar degeneration syndromes, TELL’s tasks and metrics have proven useful to identify and monitor other disorders, including mild cognitive impairment [37, 38], Huntington’s disease [39], and psychiatric conditions [40].
Data Availability Statement
The data that support the findings of this study are not publicly available due to their containing information that could compromise the privacy of research participants but are available from the corresponding author (A.M.G.) upon reasonable request.