Abstract
Background: Falls are the most common cause of injury and hospitalization and one of the principal causes of death and disability in older adults worldwide. This study aimed to determine if a method based on body-worn sensor data can prospectively predict falls in community-dwelling older adults, and to compare its falls prediction performance to two standard methods on the same data set. Methods: Data were acquired using body-worn sensors, mounted on the left and right shanks, from 226 community-dwelling older adults (mean age 71.5 ± 6.7 years, 164 female) to quantify gait and lower limb movement while performing the ‘Timed Up and Go’ (TUG) test in a geriatric research clinic. Participants were contacted by telephone 2 years following their initial assessment to determine if they had fallen. These outcome data were used to create statistical models to predict falls. Results: Results obtained through cross-validation yielded a mean classification accuracy of 79.69% (mean 95% CI: 77.09–82.34) in prospectively identifying participants that fell during the follow-up period. Results were significantly (p < 0.0001) more accurate than those obtained for falls risk estimation using two standard measures of falls risk (manually timed TUG and the Berg balance score, which yielded mean classification accuracies of 59.43% (95% CI: 58.07–60.84) and 64.30% (95% CI: 62.56–66.09), respectively). Conclusion: Results suggest that the quantification of movement during the TUG test using body-worn sensors could lead to a robust method for assessing future falls risk.
Introduction
Falls in the elderly are a major problem, with an estimated 30% of elderly adults over 65 years of age falling each year. In the community, the proportion of people who sustain at least one fall over a 1-year period varies from 28 to 35% in the >65-year age group to 32–42% in the ≥75-year age group, with 15% of older people falling at least twice each year [1]. Incidence rates in hospitals are higher, and in long-term care settings approximately 30–50% of people fall each year, with 40% falling recurrently [2]. The direct and indirect societal costs of falls are enormous. Among older people in the USA alone, the cost of falls has been estimated to be in the region of USD 20 billion per year [3]. The combination of high frequency and high susceptibility to injury in older people make falls a major geriatric issue in their own right. Multifactorial intervention has been shown to be effective in reducing the incidence of falls in community-dwelling older adults [4] although, despite detailed targeted multifactorial interventions, the best reported reduction in the incidence of falls is in the region of 30% [5,6,7]. Accurate identification of those participants at high risk of falls would facilitate appropriate and timely intervention, and could lead to improved quality of care and reduced associated hospital costs due to reduced admissions and reduced severity of falls.
Falls risk is generally assessed in a clinical setting by physiotherapists, geriatricians, clinical nurse specialists or occupational therapists. A variety of validated clinical recommendations exist for assessing falls risk [8,9,10]. However, these can be subjective, variable in administration and may require specialist expertise. As a result, quantitative methods for estimating falls risk have been investigated [11,12,13,14]. An objective method for assessing falls risk, suitable both for use by non-experts and for deployment in a community care setting, may find clinical application for screening and targeting of individuals at high risk.
Falls in the elderly have been associated with gait impairment [15], deteriorating postural stability, muscular strength and vestibular function. These deteriorations can manifest as problems in walking and turning [16]. The ‘Timed Up and Go’ (TUG) test is a standard mobility assessment used to screen for balance problems in older people [17,18,19]. The TUG test consists of the participant getting up from a chair, walking 3 m, turning at a designated spot, returning to the seat and sitting down. The time taken to perform the test is recorded using a stopwatch. Current clinical practice suggests that elders with longer TUG times are more likely to fall than those with shorter times. The performances of elders prone to falling can be dramatically different from those who do not fall; consequently, the TUG test is one of the most widely used tools for identifying elders at risk of falls [19] and has been recommended by the American Geriatrics Society/British Geriatrics Society guidelines as a screening tool for identifying older people at increased risk of falls [20].
A gyroscope is a device for measuring rotation or orientation. A previous cross-sectional study by the authors reported that parameters derived from body-worn gyroscopes (which measured rotation in three dimensions and were placed on community-dwelling older adults while performing the TUG test) could distinguish between elders with and without a history of falls [11]. The present study reports on the utility of this method in prospectively assessing risk of falls within 2 years of assessment and uses statistical models, validated using unseen data (a supervised pattern recognition approach) to recognise or ‘classify’ fall-specific patterns and determine the optimal parameters to classify falls risk. This study aimed to show that this novel method is effective at predicting falls, and to compare it, on the same data set, with two standard methods for assessing falls risk. To the best of the authors’ knowledge, the present method is the first to prospectively estimate future falls risk in community-dwelling older adults based on parameters derived from body-worn sensors.
Participants and Methods
The gait and balance of 349 (103 males, 246 females) community-dwelling older adults were evaluated in the Technology Research for Independent Living (TRIL) Clinic, St. James’s Hospital, Dublin, Ireland. This study was conducted as part of a larger study on aging (www.trilcentre.org), a portion of which aims to develop technologies to enhance the clinical assessment of falls risk. The inclusion criteria were persons aged 60 and over, who were able to walk independently with or without walking aid, cognitively intact and able to provide informed consent. Ethical approval was received from the St. James’s Hospital/Adelaide and Meath Hospital, incorporating the National Children’s Hospital Research Ethics Committee (approval reference number 2007/06/13). Forty-seven participants (13.47%) were referred to the TRIL Clinic from the SJH Emergency Department, 36 (10.32%) from the St. James’s Hospital Falls and Blackout Unit, 19 were referred by their family practitioner and 13 (5.44%) by a specialist out-participant clinic. The remainder of the participants (234, 67.05%) was self-referred.
Participants were contacted by telephone approximately 2 years following their initial (baseline) assessment and asked to complete a survey on their falls history subsequent to their initial assessment. Falling was defined as a sudden, unintentional change in position causing an individual to land at a lower level, on an object, the floor, the ground or other surface [21]. Falls outcome data were verified using collateral history from relatives as well as comparison with hospital records. Participants with two or more falls in the follow-up period were deemed recurrent fallers.
This study reports regularized discriminant classifier models [22,23], developed using sensor data collected during the baseline evaluations and trained using the falls outcome data.
Clinical Assessment
All participants had a detailed clinical assessment [24] which included assessments of falls risk factors. Details of the clinical assessment are tabulated in table 1 in order to provide a description of the study cohort. Muscle strength was assessed using maximum grip strength (pounds), taken as the maximum of the left and right hand grip strength values, measured using a handheld dynamometer (Baseline® Hydraulic Hand Dynamometers, NexGen Ergonomics Inc., Que., Canada). Two commonly used measures of visual acuity and contrast sensitivity were also included in the analysis for each participant: Binocular logMAR and the Pelli-Robson Contrast Sensitivity Scale. Participants were also evaluated using the age-adjusted Charlson comorbidity index [25] and the Mini-Mental State Examination [26]. The blood pressure of each participant was obtained using a Finometer (Finapres Medical Systems, Amsterdam, The Netherlands) to check for orthostatic hypotension (defined as orthostatic SBP drop >20 mm Hg). Each participant was evaluated using the manual TUG and also the Berg Balance Scale (BBS) [8] in order to provide two standard measures of falls risk for each participant for comparison with the quantitative method described in this study.
Sensor Data Acquisition
The TUG test was performed as follows: the participant was asked to get up from a standard chair (46 cm high seat, 65 cm arm-rests), walk 3 m, turn at a designated spot, return to the seat and sit down (the experimental set-up and sensor coordinate axes are illustrated in figure 1, the designated turning point is marked with a ‘×’). The time taken to complete the task was recorded by the clinician using a stopwatch. The time was measured from the moment the clinician said ‘go’ to the moment the participant sat back on the chair (this time is referred to hereafter as the manual TUG time). The task was demonstrated to each participant and participants were given time to familiarize themselves with the test. Participants completed the TUG once but were allowed to repeat the test if they did not complete the first one correctly. Participants were not allowed to use a walking aid during the test. In order to quantify movement, kinematic data for each participant were acquired using two body-worn inertial sensors (Shimmer, Dublin, Ireland, http://shimmer-research.com), which were attached to the mid-point of the anterior shank (shin) [11]. Each sensor contained a tri-axial accelerometer and a tri-axial gyroscope, and sampled at 102.4 Hz. Sensors were calibrated using a standard method [27]. The raw gyroscope signal was low-pass filtered with a zero-phase 2nd-order Butterworth filter with a 20-Hz corner frequency. Sensor data (streamed via Bluetooth) and video data were synchronously acquired using a custom BioMobius application (http://www.biomobius.org). The video data for each TUG test were visually inspected to ensure only data from valid TUG tests were included in the analysis. Video (used only for data validation purposes) and sensor data were also edited synchronously within the acquisition software to ensure only data relevant to the TUG were included in each file. The sensor data for each test were then exported to text format for subsequent offline analysis in Matlab version 7.11 (Natick, Va., USA, http://www.mathworks.com/).
Experimental set-up for capture of inertial sensor data during the TUG test. Sensor used for data acquisition (physical dimensions: 5.4 × 1.9 × 3.2 cm), and sensor coordinate axes are indicated (inset). Acc = Classification accuracy.
Experimental set-up for capture of inertial sensor data during the TUG test. Sensor used for data acquisition (physical dimensions: 5.4 × 1.9 × 3.2 cm), and sensor coordinate axes are indicated (inset). Acc = Classification accuracy.
Sensor Data Analysis
The movement of each participant performing the TUG test was evaluated using quantitative movement parameters (features) derived from the angular velocity signals obtained from the tri-axial gyroscope sensors mounted on each shank. The 44 sensor-derived features can be grouped into four categories: temporal gait parameters, spatial gait parameters, tri-axial angular velocity parameters and turn parameters.
The data were stratified by gender and age. As discussed elsewhere [11], the number of males with no history of falling in the baseline data set was deemed insufficient to generate robust classifier models in two age categories per gender. As a result, the data were considered in three separate groups: males, females <75 years of age, females ≥75 years of age. Due to hardware errors during sensor data acquisition, the sensor data for some participants were incomplete. Participants with incomplete sensor data were removed from the analysis.
Sequential forward feature selection [28] combined with regularized discriminant classifier models [23] were used to generate each of the three predictive classifier models (males, females <75 and females ≥75) for estimating the risk of future falls in community-dwelling older adults. A grid search was used to determine the optimum feature set and classifier model parameters for each of the three classifier models. Features (inertial sensor-derived and demographic parameters) included in each model along with associated regularization parameters are detailed in table 2. Technical details of this method have been reported elsewhere as part of a cross-sectional study on retrospectively distinguishing community-dwelling older adults with and without a history of falls [11]. The present study aimed to determine the utility of that method in prospectively evaluating falls risk, wherein each classifier model was trained to recognize sensor/physical data patterns associated with future falls. Classifier models were then tested using data from ‘unseen’ participants using cross-validation.
Evaluating Algorithm Performance
The performance of the algorithm in predicting falls was estimated using an approach called ‘10-fold cross-validation’. This involves randomly splitting the data into 10 equal ‘folds’: 9 of these folds are then used to train the classifier (training set) and the remaining fold is then used to test the performance of the classifier (test set), this is done for each possible combination of training and test set. This procedure was repeated 10 times (shuffles). The classifier performance measures were then taken as the mean of each measure across all folds and shuffles, providing an unbiased, low-variance estimate of the classifier’s performance. The 95% confidence interval (95% CI) was also calculated for each classifier performance metric. Fitting the algorithm in this way reduces the chances of overfitting and ensures robust estimates of the statistical summaries [22].
The classification accuracy is defined as the percentage of participants correctly identified by the system as being a faller or a non-faller. The sensitivity is defined as the percentage of fallers correctly identified by the system. The specificity is defined as the percentage of non-fallers correctly identified as such by the algorithm. The area under the receiver operator characteristic (ROC) curve is used as an additional metric of algorithm performance as it has been shown to provide a reliable overall index of diagnostic performance [29]. Positive and negative predictive values were also calculated to provide a measure of the predictive power of positive and negative (faller and non-faller) classifications. The positive predictive value is defined as the proportion of participants, classified as fallers by the algorithm, who are correctly classified. Similarly, the negative predictive value is the proportion of participants, classified as non-fallers by the algorithm, who are correctly classified.
Additional Statistical Analysis
In addition to the multivariate classification analysis, an exploratory analysis was also carried out to test for statistical differences in each sensor-derived feature as well as demographic and clinical data between fallers and non-fallers in each of the three groups. Due to the non-parametric nature of much of the data, the Mann-Whitney version of the Wilcoxon rank sum test was used [11]. This test was also used to examine differences between fallers and non-fallers in each of the clinical and demographic parameters (online supplementary table, see www.karger.com?doi=10.1159/000337259). A χ2 test of proportions was used to examine if there were differences in proportions between fallers and non-fallers. A two-sided t test was used to determine if the classification results obtained were significantly better than those obtained using only the manual TUG or BBS. In order to allow comparison of these data to previous studies and determine the optimum cut-off points for the manual TUG and BBS, a logistic regression model was fitted to the manual TUG and BBS data using falls outcome as the dependent variable in order to determine the 50% probability cut-off point for this cohort. This yields the ‘best estimate’ threshold for falls prediction for this group for each of these measures.
Results
Two years after baseline assessment, telephone contact was made with 299 of 349 (response rate: 85.67%) participants. Of the remainder, 18 participants died during the follow-up period (survival rate: 94.75%), 6 participants could not be contacted, 26 participants refused consent to the telephone interview. Three participants were physically unable to complete the survey although they did provide information on their falls history since the initial evaluation.
Sensor data for a number of participants needed to be excluded due to problems with data acquisition during the initial assessment, including sensor/software failure and human error. Sensor data for 226 of 299 (75.59%) participants were then included in the final analysis (62 men, 164 women; table 2). These 226 participants experienced a total of 144 falls since the initial assessment. Eighty-three participants (35.84%, 63 female), referred to hereafter as ‘fallers’, experienced a fall in the intervening period, 143 participants (64.16%, 101 female) did not, and are referred to as ‘non-fallers’. Thirty-one participants (13.72%) had 2 or more falls in the follow-up period and were deemed ‘recurrent fallers’. Sixty-four (28.32%) participants reported being hurt or injured by a fall during the follow-up period. One hundred and twenty-seven participants (56.19%) had a history of falls prior to the initial assessment, 99 (43.81%) did not; the performance of the method in retrospectively assessing falls risk is reported elsewhere [11]. The number of falls reported to have occurred during the follow-up period are detailed graphically in figure 2. The prevalence of fallers in each of the three grouping was between 32 and 40%, the number of fallers in each group is highlighted in table 2. The mean age of the cohort at the time of the initial evaluation was 71.5 ± 6.7 years, while the mean height and weight were 165.4 ± 9.4 cm and 73.6 ± 14.3 kg, respectively. Clinical information for the cohort from the initial baseline assessment is detailed in table 1. No statistical differences between fallers and non-fallers were observed in any of the clinical parameters measured at baseline with the exception of the manual TUG and the BBS. However a χ2 test of proportions revealed that a significantly larger (p < 0.05) number of fallers had Parkinson’s disease and used walking aids when compared to non-fallers. No other differences in proportions were observed. Using logistic regression, the BBS cut-off point for identifying fallers was 45, while the manual TUG cut-off point for identifying fallers was 15.25 s.
Number of falls during the 2-year follow-up period subsequent to baseline evaluation. A total of 144 falls were recorded during the follow-up period.
Number of falls during the 2-year follow-up period subsequent to baseline evaluation. A total of 144 falls were recorded during the follow-up period.
The features included in each classifier model are detailed in table 2. The performance of the algorithm in prospectively predicting falls in community-dwelling older adults is detailed in table 3. The 10-fold cross-validation yielded a mean classification accuracy of 79.69% and a mean area under the ROC curve of 0.78 indicating good discrimination between the faller and non-faller classes. Figure 3 shows the ROC curves for all classifier models.
Results obtained from regularized discriminant classifier models for prospective evaluation of falls risk using shank-mounted inertial sensors

ROC curves obtained from cross-validated falls risk estimates for each classifier model (a) and each reference method, i.e. BBS (b) and manual TUG (c). Areas under the ROC curves for males, females <75 and females ≥75 are 0.74, 0.76 and 0.85, respectively. BBS and manual TUG had areas under the ROC curve for each group of 0.65, 0.57, 0.72 and 0.71, 0.50, 0.64, respectively.
ROC curves obtained from cross-validated falls risk estimates for each classifier model (a) and each reference method, i.e. BBS (b) and manual TUG (c). Areas under the ROC curves for males, females <75 and females ≥75 are 0.74, 0.76 and 0.85, respectively. BBS and manual TUG had areas under the ROC curve for each group of 0.65, 0.57, 0.72 and 0.71, 0.50, 0.64, respectively.
Mean classification accuracy for the male model was 83.06% (95% CI: 80.45–85.41) while the female <75 and female ≥75 models had classification accuracies of 72.97% (95% CI: 70.34–75.69) and 83.02% (95% CI: 80.49–85.91), respectively. The results for classifier models for the manual TUG and BBS yielded a mean classification accuracy of 55.61 and 63.42%, respectively. Detailed results for each of the three groups and reference methods are provided in table 3. In order to examine if the accuracy of the multivariate classifier was significantly larger than each of the manual measures, a two-sided t test was used. It was found that the classification accuracy (as obtained from 10 folds and 10 shuffles using cross-validation) was significantly higher (p < 0.0001) than the classification accuracies for both the manual TUG and BBS in each of the three groups. Table 2 provides three case studies to illustrate the operation of each of the three classifier models. The features included in each classifier model for a given participant are shown and compared to the mean values for each of the participant groupings. The probability of falls can be obtained from the fitted classifier model and this is provided for each case for illustrative purposes in table 2. The effect on the falls risk estimate of changing certain key features is then shown. This process can be readily repeated for other individuals who carry out the test. The calculation can be easily encoded in software and provides feedback directly afterward to the clinician carrying out the test.
Discussion
The present method is the first to offer a predictive estimate of falls risk in community-dwelling older adults based on parameters derived from body-worn gyroscopes. Results were significantly more accurate than those obtained for falls risk estimation using two standard measures of falls risk.
A number of studies have used a prospective study design to quantitatively examine falls risk in older adults [30,31]. Stalenhoef et al. [32] reported a prospective risk model for recurrent falls based on 311 participants that employed four risk factors: a history of two or more falls in the previous year, mobility impairment, reduced grip strength and the presence of a depressive state. With a cut-off point of 0.3, the method offers a sensitivity of 59% and specificity of 87% for predicting falls.
Many previous studies have focused on how parameters derived from body-worn inertial sensors correlate with clinically validated falls risk assessments, based on retrospective falls history [12,14,33]. To our knowledge, only one previous study sought to prospectively predict falls using body-worn sensors: Marschollek et al. [13] followed up 50 geriatric in-participants for 1 year after assessment using a waist-mounted accelerometer. In that study, a best classification accuracy of 80% (sensitivity: 58%, specificity: 96%) was reported based on accelerometer-derived parameters classified using a classification tree. The results were not compared against reference methods for falls risk estimation on the same data set. The present method is the first to offer a predictive estimate of falls in community-dwelling older adults using body-worn sensors.
In previous research on the same sensor data set [11], we found a number of parameters that were strongly associated with falls history yet did not have a strong correlation with either of the clinical methods for assessing falls risk included in the study (BBS and manual TUG). Correlation analysis on the present data set found that manual TUG time was negatively correlated with the BBS score (ρ = –0.76, p < 0.001). A detailed discussion of how each of the novel features used in this study correlate with BBS and manual TUG is provided elsewhere [11]; a number of parameters used in the present study were found to have a strong association with falls risk yet did not show a strong correlation with the BBS score and manual TUG. This may suggest that some movement patterns associated with falls risk are not captured by conventional clinical mobility and balance scales [34]. Gates et al. [35] provide a systematic review on the accuracy of screening instruments for predicting falls risk in community-dwelling elders. The results for mean area under the ROC curve in the present study (0.77) are superior to the values reported in any of the studies examined by Gates et al. [35] (0.51–0.67). A variety of studies have been reported that examined the utility of the manual TUG test in predicting falls. Many of those studies used very specific participant populations, such as participants with vestibular dysfunction), and/or relatively small sample populations [36,37], leading to differences in reported mean and cut-off values. It is noteworthy that our results for the relation of the TUG time to falls largely agreed with those reported by Thrane et al. [38] in a large-scale retrospective study on the relation of the manual TUG time to falls history. This study examined 974 older adults (396 with a history of falls and 578 without) and appears to be the largest of its kind reported to date. Similarly, differences in sample size and population characteristics may account for differences between our study and those examining the BBS and its association with falls risk [39,40].
The use of gyroscopes in the present study may have advantages over accelerometers for use by non-experts in a community setting, as gyroscopes are less sensitive to the influence of gravity and therefore the signal is less dependent on exact sensor positioning [41].
The most significant limitation of the present study is the self-reported nature of the follow-up data obtained via the telephone survey. Participants were asked to recall if they had fallen since their initial evaluation, which could have been inaccurate, particularly in participants suffering from cognitive decline. Additionally, the considerable length of time between assessment and follow-up increases the likelihood of unrelated physical decline occurring in that interval. However, we feel this limitation does not invalidate our findings, given that the main aim of the present study is to compare two standard methods for estimating falls risk (manual TUG and BBS) against our novel method using the same measure of outcome and given the same significant limitation. The cohort used was a sample of convenience rather than a representative sample, and so may contain sample bias. While every effort was made to ensure the statistical models used in the present study were generalized across the study population, differences may exist when compared to the general population, for example, a large proportion of the cohort were self-referrals (234/349), which could indicate differences when compared to other studies employing cohorts made up of hospital in-participants or nursing home residents.
Objective assessment of falls risk using a standardized protocol as reported here has potential to improve the quality of care offered to community-dwelling older adults at risk of falling and allow more timely intervention to prevent falls. Furthermore, the reported method has potential for use in a supervised monitoring protocol where an increase in falls risk, manifesting as deterioration in a subject’s gait and balance would be noted as a change in the results of periodically administered assessments. This could form part of a falls risk screening tool, in a supervised environment such as a primary care facility.
Acknowledgments
We acknowledge the help and support of the staff of the TRIL Clinic, St. James’s Hospital and the participants involved in this study. The authors would also like to acknowledge Mr. Tim Foran for help with data acquisition, and Dr. Aine Ní Mhaolain for comments on the manuscript.
Funding
The TRIL Clinic is funded by Intel Corporation, the Industrial Development Agency Ireland and GE Healthcare, with operational and laboratory support from St. James’s Hospital, Dublin.
Intel’s role in the research (e.g. formulation of research question(s), choice of study design, data collection, data analysis and decision to publish) has been in study design, data collection, data analysis and manuscript preparation. The Shimmer board design is owned by the Intel Corporation.