Current measures of neurodegenerative diseases are highly subjective and based on episodic visits. Consequently, drug development decisions rely on sparse, subjective data, which have led to the conduct of large-scale phase 3 trials of drugs that are likely not effective. Such failures are costly, deter future investment, and hinder the development of treatments. Given the lack of reliable physiological biomarkers, digital biomarkers may help to address current shortcomings. Objective, high-frequency data can guide critical decision-making in therapeutic development and allow for a more efficient evaluation of therapies of increasingly common disorders.
Barriers to Drug Development for Neurodegenerative Conditions
Failure is the norm in drug development, especially for neurodegenerative disorders. In comparison to other therapeutic areas, drug development for neurological disorders generally takes longer, costs more, and is less likely to succeed (Table 1) . As success is less common, the unmet medical need is greater. Neurodegenerative disorders, which are already among the leading causes of disability in industrialized countries (Fig. 1), will become increasingly prevalent as the proportion of the world’s population over 65 will likely double from 8.5% worldwide to nearly 17% or 1.6 billion people by 2050 .
Failure, when it occurs, should occur early and efficiently, limiting unnecessary downstream costs. However, some of the largest clinical trials ever conducted for neurodegenerative disorders have failed in phase 3 despite encouraging phase 2 clinical trial results (Table 2) . In some cases, not only have the trials failed to confirm earlier signals but were also stopped early due to futility, defined as limited likelihood of eventual success [4-7]. These failures late in development carry enormous human and economic costs and can deter future investments . Venture capital funding for neurology-focused companies has decreased by 40% since 2004. In addition, from 2004–2008 to 2009–2013, funding for novel neurology research and development has dropped 56% . While the limited ability of experimental therapeutics to affect underlying biological processes is a common cause of failure, current outcome mea sures – artificial, imperfect, and biased snapshots of clinical status – are another.
Unlike most other organ systems, direct measures of brain function and pathology are limited. Biomarkers from blood or cerebrospinal fluid are still largely in their infancy and have failed to contribute to new drug registrations. Imaging, especially MRI, has helped revolutionize therapeutic development for multiple sclerosis  but has had more modest utility in other neurodegenerative conditions. Similarly, functional imaging modalities, including PET and SPECT, have facilitated the diagnosis of Parkinson disease but have limited benefit to measure disease progression or response to therapy . In the absence of well-established and validated biomarkers of diagnosis or disease progression, the primary means for assessing the efficacy of novel therapies for neurodegenerative disorders are largely a combination of clinician-administered rating scales and patient-reported outcomes such as the Alzheimer’s Disease Assessment Scale-Cognitive Subscale  and the Movement Disorder Society-Unified Parkinson’s Disease Rating Scale . However, these scales have substantial limitations, which can be mitigated in part by digital biomarkers.
Opportunity for Digital Biomarkers
Current clinical outcome measures for neurodegenerative disorders are limited because they are (1) rater dependent, (2) highly variable, and (3) episodic. By their nature, rating scales are dependent on the raters, who have variable expertise and experience. Although many studies have established a certain level of inter-rater and intra-rater reliability, results tend to arbitrarily establish “moderate to high” reliability and lack agreement between studies [13, 14]. Studies often recommend the use of more specialized raters such as physicians over nurse practitioners and for the same rater to follow participants longitudinally, since intra-rater reliability is better than inter-rater reliability . However, given that adequately trained raters are few, expensive to utilize, and frequently located only in specialized centers, this functions as another roadblock to efficient therapeutic development. Furthermore, raters are subject to bias – conscious or unconscious tendencies to distort the truth. While blinding of interventions can minimize potential bias, clinicians are subject to expectations on the effectiveness of drugs. For example, in a recent clinical trial of pridopidine in Huntington disease, in which 2 previous trials suggested an improvement in the total motor score in the active arm [15, 16], the total motor score in both the placebo and active arms improved substantially in a disorder where scores should worsen .
Variability further limits current measures. By their nature, assessments are subjective or when quasi-objective (e.g., counting finger taps in Parkinson disease) require human transformation of a quantitative signal to an ordinal ranking. The actual quantity and quality of taps are transformed into a categorical scale and precise measurements of the number, speed, or rhythmicity of the taps are lost. Intra-rater variability and, in multi-center trials, inter-rater variability further reduce precision. This variability reduces confidence in the replicability of findings, especially in early-stage clinical trials that may be underpowered for efficacy measures. An illustration of this variability can be drawn from recent studies in Parkinson disease. In two phase 2 Parkinson disease trials, the 12-month change in the total Unified Parkinson’s Disease Rating Scale was 6 points in the placebo arm in one study and 8 points in the second (33% higher) [7, 18, 19]. This difference occurred despite the use of the same sites, the same investigators, the same patient population, and very similar protocols.
To offset the variability in these scales, more observations are required. However, current scales are infrequently (e.g., monthly, quarterly) administered, and most trials rely on changes in these measures from the beginning to the end of the trial. The episodic assessments, all conducted in artificial environments at arbitrary times, limit the power of studies and may not even be an accurate representation of an individual’s true state . Moreover, episodic assessments cannot reliably evaluate fluctuating events (e.g., the variable response to medication), capture rare incidents (e.g., falls), or assess behaviors that take place over long pe riods outside the examination room (e.g., everyday life activities) .
Digital biomarkers – the use of a biosensor to collect objective data on a biological (e.g., blood glucose, serum sodium), anatomical (e.g., mole size), or physiological (e.g., heart rate, blood pressure) parameter followed by the use of algorithms to transform these data into interpretable outcome measures  – can help address many of the shortcomings in current measures. These new measures, which include portable (e.g., smartphones), wearable, and implantable devices, are by their nature largely independent of raters. They are, therefore, not prone to rater bias. An early digital biomarker for neurodegenerative conditions is the “Q motor,” an in-clinic quantitative motor assessment of finger tapping, hand movements, and other tasks, which has been used in clinical trials in Parkinson  and Huntington disease . In contrast to the total motor score, an investigator-rated assessment of motor signs in Huntington disease, the Q motor does not exhibit placebo effects .
Because digital biomarkers can include assessments in real-world environments, the variability in the assessments gleaned from digital biomarkers are likely to be greater than rating scales conducted in controlled clinical environments. For example, in a pilot study of wearable accelerometers in Huntington disease, the variability in various measures of gait was greater when assessed at home compared to those performed in clinic. However, the increased variability was more than offset by the increased frequency in assessments. Compared to the 20 gait assessments done in clinic, more than 14,000 gait assessments were captured over 1 week outside of the clinic . The resulting immense increase in statistical power enabled the identification of multiple significant differences in gait between those with and without Huntington disease that were not detected based on the in-clinic assessments. The goal of digital biomarkers is to maximize the ecological validity and temporal and spatial resolution of capturing motor and nonmotor phenomena that are expected to change over time. As such, wearable technology may provide a more realistic portrayal of behaviors of interest in clinical and research settings .
Rigorous, independent evaluations of digital biomarkers are few. As of November 2015, at least 73 different devices (22 wearable, 38 nonwearable, and 13 hybrid) have been developed to assess Parkinson disease alone . However, of the 38 nonwearable devices, only the Nintendo Wii Balance Board and GAITRite® gait analysis system were successfully used by groups besides their own developers [28, 29]. Besides lack of validation, additional reasons that may be contributing to the limited use of these technologies include skepticism by the medical community, inconsistent adherence by patients, discrepancy between research utility and clinical value, and lack of compatibility between systems, which hinders data integration and analytics [21, 30]. However, early studies suggest that a smartphone application in Parkinson disease can differentiate those with the disease from those without, be used to predict disease severity as assessed by traditional measures, and detect pharmacological effects of approved therapies . In addition, where evidence of value has been generated [32-36], the US Food and Drug Administration has cleared some digital biomarkers for use [37, 38].
Experience using digital biomarkers as outcome measures in clinical trials is limited but growing fast. A trial of a nitrate in congestive heart failure recently published in The New England Journal of Medicine used an accelerometer as the primary outcome measure . Its use was supported by traditional measures, such as a timed walking test and a quality of life measure, as secondary outcome measures. For neurodegenerative disorders, Roche recently incorporated a Parkinson disease smartphone application that measures gait, balance, finger tapping, and voice as an exploratory measure in a phase 1 clinical trial . Such experimentation in clinical trials is needed to establish feasibility, collect data to demonstrate the validity of such tools alongside traditional measures, foster regulatory acceptance, and eventually provide data to determine the clinical meaningfulness of data captured from this new class of biomarkers.
In the short term, the initial application of these tools may be to support internal decision-making by pharmaceutical and medical device firms. Firms need objective, ideally sensitive, and frequent measures to support dose selection and to make go/no-go decisions on therapies while they are early in development. Traditional clinician-rated outcome measures have often failed to provide reliable guidance.
The need for that guidance is increasing as over the past 20 years, the number of deaths attributed to Parkinson disease globally has more than doubled and the number due to Alzheimer disease and other dementias has more than tripled . Given that many of these neurodegenerative disorders, such as Parkinson disease, have external manifestations in cluding movement disorders, they lend themselves well to assessment by digital biomarkers. These biomarkers can also provide insights into how these diseases affect other domains of health from sleep to socialization [42, 43] that are poorly, or not, assessed with traditional scales. The failures of the past and the rising global burden of neurodegenerative disorders call for new tools for therapeutic development. Digital biomarkers are such a tool and can accelerate the evaluation of much needed therapies for the field.
The authors have no ethical conflicts to disclose.
Conflict of Interest Statement
Dr. Dorsey is a consultant to MC10, a wearable sensor company. Dr. Papapetropoulos is a full-time employee of TEVA Pharmaceuticals and serves as co-chair of the MDS Technology Task Force. Dr. Kieburtz has received research support the National Institutes of Health (NINDS), Michael J. Fox Foundation, and TEVA and has acted as a consultant for the National Institutes of Health (NINDS), Acorda, Astellas Pharma, AstraZeneca, BioMarin Pharmaceutica, Biotie, Britannia, CHDI, Clearpoint Strategy Group, Clintrex, Corium International, Cynapsus, Forward Pharma, Genzyme, INC Research, Intec, Lundbeck, Medivation, Melior Discovery, Neuroderm, Neurmedix, Orion Pharma, Otsuka, Pfizer, Pharma2B, Prana Biotechnology, Prothena/Neotope/Elan Pharmaceutical, Raptor Pharmaceuticals, Remedy Pharmaceuticals, Roche/Genentech, Sage Bionetworks, Sanofi, Serina, Sunovion, Synagile, Titan, Upsher-Smith, US WorldMeds, Vaccinex, Vertex Pharmaceuticals, and Weston Brain Institute.
Research for this paper was supported in party by NINDS (P20NS092529). Opinions and viewpoints expressed are those of the authors.