Abstract
Introduction: The digitization of hospital systems, including integrated electronic medical records, has provided opportunities to improve the prediction performance of inpatient fall risk models and their application to computerized clinical decision support systems. This review describes the data sources and scope of methods reported in studies that developed inpatient fall prediction models, including machine learning and more traditional approaches to inpatient fall risk prediction. Methods: This scoping review used methods recommended by the Arksey and O’Malley framework and its recent advances. PubMed, CINAHL, IEEE Xplore, and EMBASE databases were systematically searched. Studies reporting the development of inpatient fall risk prediction approaches were included. There was no restriction on language or recency. Reference lists and manual searches were also completed. Reporting quality was assessed using adherence to Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis statement (TRIPOD), where appropriate. Results: Database searches identified 1,396 studies, 63 were included for scoping assessment and 45 for reporting quality assessment. There was considerable overlap in data sources and methods used for model development. Fall prediction models typically relied on features from patient assessments, including indicators of physical function or impairment, or cognitive function or impairment. All but two studies used patient information at or soon after admission and predicted fall risk over the entire admission, without consideration of post-admission interventions, acuity changes or length of stay. Overall, reporting quality was poor, but improved in the past decade. Conclusion: There was substantial homogeneity in data sources and prediction model development methods. Use of artificial intelligence, including machine learning with high-dimensional data, remains underexplored in the context of hospital falls. Future research should consider approaches with the potential to utilize high-dimensional data from digital hospital systems, which may contribute to greater performance and clinical usefulness.
Introduction
Inpatient falls are a significant cause of morbidity and mortality in hospitals internationally [1] and are associated with increased length of stay and hospitalization costs [2, 3]. There is evidence to suggest some portion of inpatient falls are preventable, and there has been significant investment in the development and evaluation of intervention strategies to prevent hospital falls in recent decades [1, 4]. There has also been concurrent development and evaluation of approaches to predict fall risk among hospital patients, with the intention of directing these interventions to those patients at greatest risk. However, a recent systematic review of approaches to hospital fall risk prediction report that performance has been less than is clinically useful [5].
Despite wide adoption of fall risk screening and assessment tools in hospital settings, randomized trials that have evaluated fall risk prediction in acute hospital settings as part of multifactorial strategies intended to prevent falls have not reported favourable findings [6]. For example, when a fall risk assessment tool was combined with 6 fall risk mitigation strategies in a multisite randomized controlled trial (24 hospital wards, n = 46,245 admissions), no beneficial effect on falls or fall outcomes was found, despite an observed increase in use of the risk assessments and associated interventions. In another trial, cessation of the existing fall risk prediction approaches was reported to be non-inferior to their continued use and potentially inferior to clinician judgement without any formal risk assessment [7]. The lack of clinical trial evidence for the use of fall risk assessments in the acute hospital setting may be due to their poor classification performance of inpatient fall risk and inability to direct preventative interventions, or the ineffectiveness of interventions intended to reduce falls, or both.
The increasing digitization of hospital systems may promote advancement in models that predict hospital falls. Digital hospital systems, including comprehensive integrated electronic medical records, now contain a wide array of routinely recorded information that may have value in the context of fall risk prediction. These include longitudinal healthcare records that are updated throughout a patient’s admission. This digital information corresponds to an opportunity for a wider range of methods [8] to predict future patient events such as inpatient fall risk assessments. However, there has not yet been any systematic collation and reporting of the available models to predict inpatient falls, including those using more advanced analytic approaches and more comprehensive data sources. It is not yet clear whether the recent advancements in integrated electronic medical record systems have been successfully utilized for inpatient fall prediction model development as it has been in other areas, including sepsis [9] and hospital readmission [10] prediction.
This scoping review aimed to systematically summarize the existing approaches used to predict inpatient fall risk. There have previously been reviews describing risk factors for hospital falls [11] and inpatient fall risk prediction models [5, 12, 13]. However, unlike prior reviews, this study sought to examine the scope of inpatient fall prediction model development approaches present in the literature, including identifying underexplored sources of data and prediction model development methods with a view to identifying opportunities to advance the state-of-the-art in the context of increasingly digitized hospital systems.
Methods
The protocol for this scoping review has previously been reported [14]. In summary, this scoping review was reported to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews (PRISMA-ScR) [15] and guided by the Arksey and O’Malley framework [16] as well as recent advances [17].
This review focused on answering the following research questions:
1. What clinical prediction models are available for hospital falls?
2. What methods were used to create these models?
3. What predictor variables are used in these models?
4. How well have existing models been reported?
Identifying Relevant Studies
Four electronic databases were searched for relevant studies: CINAHL via EBSCOhost, PubMed, IEEE Xplore, and Embase on the 12th of November 2020. No limitations were placed on document type, date, or language. The search terms used for each database are available in online supplementary Table 1 (for all online suppl. material, see www.karger.com/doi/10.1159/000525727).
Eligibility Criteria
We aimed to include only studies that developed an inpatient fall prediction tool or model. When only a validation study of an approach to hospital fall prediction was obtained within the search results, we searched the reference list and performed a manual search of the aforementioned databases to attempt to identify the original development study. If that study was not published or otherwise unattainable, the identified validation study was included in the review for data extraction in lieu of the development study.
We did not consider studies relating to approaches that specifically predict falls that occurred within a clinically in-actionable period in the hospital setting (for example, sensors identifying a patient in the act of falling or only seconds before a fall) or after discharge from the hospital; these were outside of the scope of this study and they were excluded. Studies that relied on input from equipment not typically available in hospital settings, including specialized accelerometers or motion-capture cameras, were excluded.
To enable the exploration of processes and data used to develop and evaluate approaches for inpatient fall risk prediction, this review included studies that reported a measure of predictive model performance. Studies that did not report a measure of predictive performance were excluded.
Study Selection
Search results from all databases were combined and imported into EndNote X9. Duplicates identified within Endnote were removed and the remaining studies were imported into Rayyan literature review management software [18]. Additional duplicate studies identified by Rayyan were also removed. Articles and abstracts were assessed for the eligibility criteria by two reviewers. Conflicts between the two reviewers were resolved by group discussion, including a third reviewer. The same two reviewers screened the full texts of the remaining studies against the eligibility criteria. To assess non-English language studies for eligibility (and charting the data for those that met these criteria), Google Translate was used with, if necessary, subsequent translation to English by colleagues fluent in that language.
Charting the Data
Two reviewers’ extracted data for each of the first 12 included full texts and discussed any discrepancies to ensure consistency in how data was extracted by each reviewer. Following a discussion between the reviewers in this pilot data extraction and the emergent consistency between reviewers, one of these two reviewers performed data extraction for all the remaining studies. For studies in a non-English language, data were extracted from the translated text; the extracted items were then verified by the colleagues who were fluent in that language.
The fields that were extracted included an assessment of adherence [19] to the Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis statement (TRIPOD) [20], details relating to sample sizes, proportion of fallers, setting and source of data, and data preprocessing and modelling methods. Conference abstracts and validation only studies were not subject to the TRIPOD adherence assessment portion of the data extraction as this assessment was directed only at development studies.
Collating, Summarizing, and Reporting the Results
Scoring for TRIPOD adherence, aggregation of all other fields, and generation of tables and figures were performed using R software [21].
Patient and Public Involvement
No patients or members of the public were involved in this study.
Results
Search Results
The database searches yielded 1,396 studies. After de-duplication and title and abstract screening, 109 full texts were reviewed, and 63 were included for data extraction. During data extraction, one validation only study was replaced with an alternative validation study of the same tool, as it was more descriptive of the original model. Forty-five of the included 63 studies were full-text articles describing the development of a fall risk prediction model and were assessed for adherence to the TRIPOD statement (Fig. 1). References for all 63 included studies are provided in online supplementary Table 2.
Characteristics of Included Studies
Of the included 63 studies, 11% (7/63) were conference abstracts; one of these performed validation only [22] and the remaining performed model development [23‒28]. Eleven of the 56 full-text studies were validation only studies; the remaining 45 full texts described model development. Included studies used data (or experts) from 18 different countries, with 24% (15/63) from the USA, 21% (13/63) from Japan, and 13% (8/63) from Australia [29‒36]. Of the development studies (n = 51), 63% used retrospective data (32/51), a third used prospective data (17/51) and the remaining two [33, 37] used no data for model development. Most (43/51) development studies used data from a single hospital; 3 studies [34, 37, 38] used data from 2 hospitals, 2 studies [30, 39] used data from 3 hospitals, 1 study [32] used data from 11 hospitals, and 1 study [40] used data from 17 hospitals. One study [26] did not report the number of hospitals from which they sourced data for model development.
Approximately half (24/51) of the included development studies (n = 51) used holdout data for internal validation, with most of the remaining studies (20/51) evaluating the model on the data used to develop it; five of the development studies [30, 34, 41‒43] used cross-validation and two performed external validation [44, 45]. Of all the included studies (n = 63), approximately a third (23/63) did not report using any selection criteria on patient age, and another third only included adults (24/63). Fourteen studies included only geriatric patients, and the remaining two studies [46, 47] used data only from paediatric patients.
Sources of Data
In most studies (41/63), fallers were identified by some form of adverse event or incident reporting system. The remainder were identified through undisclosed methods (n = 16), notification by clinical staff to study investigators (n = 7) [30, 32, 34, 40, 48, 49], chart review (n = 6) [32‒34, 36, 50, 51], or patient questionnaire (n = 1) [52], and six of these studies used combinations of these methods [30, 32‒34, 36, 49]. Only two studies reported a fixed-length prediction horizon, whereas the rest predicted any fall occurring during the entire admission after the prediction was made. Of those two studies, one predicted same-day falls [53], and the other predicted next-day falls [54]. All studies predicted inpatient falls of any kind, rather than only injurious falls or any potential category of falls, such as environmental or physiological.
Features were usually obtained directly from patient assessments including questionnaire responses (27/63) or the electronic medical records (24/63); 9 studies [24, 26, 28, 30, 31, 36, 55‒57] used manual chart review and 2 interviewed staff [26, 34]. Seven studies used multiple methods to obtain patient details used for risk prediction [26, 34, 36, 45, 56‒58]. The remaining nine studies either did not use any of these methods or state how their data were sourced [22, 23, 27, 33, 37, 59‒62].
Features used for each model are shown in Table 1. The most frequently used features were assessments of physical function or impairment (94%), cognitive function or impairment (68%), and history of falls (52%). Laboratory results and vital signs, such as blood pressure or heart rate, were rarely included (14%). Only one study used time-series data to generate predictions [65]. This study was also the only one to use free entry text as a data source. A description of items used as part of physical and cognitive function/impairment features is provided in online supplementary Table 3.
Balance, Augmentation and Imputation of Data, and Feature Engineering
The datasets reported varied in their level of balance between positive (fallers) and negative (non-fallers) events: a quarter (14/51) of development studies were of case-control design and had approximately equal sample sizes of fallers and non-fallers (Fig. 2). Validation studies included a smaller rate of positive events in their data, better reflecting the proportions of falls occurring in hospital populations.
Three development studies used methods for data augmentation. Two of these were the only two studies that used shortened prediction horizons rather than the entire inpatient stay [53, 54]. They also fitted their models using re-sampled data from the same patients on different days. The third study [65] to use data augmentation used synthetic minority oversampling. This approach generated synthetic samples with similar characteristics to observed samples for the minority class, in this case, fallers, to balance the dataset before modelling [85].
No feature engineering steps, for example, dimensionality reduction or aggregation of repeated measurements, were described in any of the included studies. Data were rarely imputed, with only four studies reporting data imputation methods: one study [43] allowed a decision tree to assign missingness and used mean-imputation for their logistic regression model, one study [56] described imputation by replacement with “normal values” without a detailed description of methods used, one study [42] imputed with the median of that feature, and one study [34] reported using a missing value analysis in SPSS without a detailed description.
Feature Selection and Choice of Algorithm
Most studies used statistical models to develop risk prediction tools (38/57). Ten used expert choice alone for development. Nine using machine-learning models, with four of these [41, 43, 60, 65], in combination, applying statistical models. Of those that used statistical models, one used a Cox regression model [27] and the rest used logistic regression. Of those that used machine-learning models, 4 used a decision tree [42, 43, 60, 76], 2 used Support Vector Classifiers [41, 54], 2 used a tree ensemble (including random forest, XGBoost, bagging, or boosting) [42, 64], 2 used an artificial neural network [41, 58], one used a proprietary machine-learning model for natural language processing [86], one used linear discriminant analysis [41], and one used a naïve Bayes classifier [41]. Two of the studies which applied machine-learning methods compared more than one group of machine-learning models [41, 42].
Of the studies that described their feature selection process (n = 51), 32 used a single method, 18 used 2 methods, and 1 study [49] used 3 methods to select features. Feature selection methods reported by these studies included univariable analyses (n = 26), expert choice (n = 18), stepwise selection (n = 17), approaches built into the algorithm being used (penalized regression (n = 1) [30]; machine-learning methods, n = 5 [42, 54, 64, 65, 76]), Akaike Information Criterion (n = 3) [29, 38, 66], and a cross-validation performance metric (n = 1) [41].
Data Used for Validation, the Approach Used to Define Risk Cut-Points, and Model Calibration
Of the 51 development studies, data sources for model evaluation included the use of a holdout set from the same setting (n = 24), use of the same data as for model development (n = 20), cross-validation (n = 5) [30, 34, 41‒43], or external validation (n = 2) [44, 45]. The remaining 12 studies were validation studies and, therefore, used external validation. Seven of the 45 full-text development studies reported any assessment of model calibration [34, 36, 39, 63, 66, 72, 73].
Thirty-seven studies categorized patients using the model into defined risk groups. To define the cut-point between risk groups, one study [43] used a 2 × 2 cost-matrix that incorporated a defined cost of a false negative to be 20-fold larger than a false positive; the rest used either the Youden index or an author-selected value for either sensitivity or specificity, and then maximized the other (n = 31) or didn’t state their method (n = 5) [32, 61, 76, 78, 82].
TRIPOD Adherence
Overall, the full-text development studies adhered to between 29% and 71% of the items within the TRIPOD statement, with an average score of 47% (Table 2). The item with the worst reporting performance was 6A (definition and assessment of the outcome), typically due to a lack of description of the time that falls were assessed, which no study satisfied (Fig. 3a). This item (6A) assesses whether the study reports a definition of the outcome, “including how and when assessed.” From the locally weighted smoothing plot, scores appear to be increasing, particularly in the last decade and since the publication of the TRIPOD statement in 2015 (Fig. 3b).
Discussion
In this scoping review, we identified 63 unique prediction models for inpatient falls; this included 45 full texts reporting the development of those models. Overall, there were substantial limitations in their development and reporting. There was a significant overlap in both the features and modelling approaches used to develop inpatient fall prediction models, with underexplored approaches including machine-learning methods, text data, time-series inputs, and incorporation of vital signs, procedures, and laboratory results, which are expected to be increasingly available for near-real-time analytics and decision support within contemporary digital hospital systems. The underexplored areas described here highlight opportunities for the field of inpatient fall prediction models to be advanced with the potential for better performance and implementation within intelligent computerized clinical decision support systems. To the best of our knowledge, this is the first study in the context of inpatient fall risk prediction models to review the features and methods used for model development, and describe the often poor but improving reporting quality of these studies.
Reporting Quality
Similar to community-dwelling fall prediction models that have previously been reviewed [87], those for inpatients have been, generally, poorly reported. However, the TRIPOD statement [20] was published in 2015, and there does seem to have been improvements in reporting quality since this time, with the best reported (71%) study [39] being one that was published in 2020. Based on this observed trajectory, it is likely that the TRIPOD statement will contribute to improved reporting quality in future studies in this field.
Methodological Gaps
Perhaps the most pertinent methodological gap in the context of hospital fall prediction was the scarcity of prediction models that take into account temporality and the potentially dynamic nature of fall risk. This is of particular relevance in hospital environments where large fluctuations in patients’ physical and cognitive function may occur due to both acute health state changes and treatments (for example, medications or surgical procedures) received. A non-temporally aware model can incorporate multivariable time-series inputs if the data modeller summarizes the changes of these predictors within the patient’s history over time before fitting the model.
Models that are temporally aware can more effectively exploit the dynamics and missingness of predictors over time as the interpretation of changes in the predictors over time is performed by the model and does not rely solely on the steps taken during data preparation. Only one of the reviewed studies developed a temporally aware model to generate predictions [65]. Using time-series data offers the potential for algorithms to model the outcome in relation to the changes in health state over time, and this may improve prediction performance for falls in the hospital setting. These methods have been applied to prediction models of sepsis within inpatients and algorithms suited to multiple variable time-series inputs are currently the best performing [8, 9]. The absence of studies applying these approaches in the context of inpatient falls relative to other fields could be, at least in part, due to a lack of wide availability of suitable open-source data. For example, the Medical Information Mart for Intensive Care (MIMIC-III) [88] dataset has patient outcomes that include sepsis and mortality but not falls.
As all but two of the reviewed studies assessed the prediction of a given patient having one or more falls during the entire stay, predictions were usually made using data available at or close to admission. Despite over a third (24/63) of studies using data available on the EMR system of the hospital, only two studies [53, 54] developed and validated models using frequently updated data from the EMR system for each patient throughout the hospitalization. Overall, there was a significant overlap of features used within the included studies. The most common features were those related to physical and cognitive function or impairment, which was not unexpected [11]. However, risk factors including comorbidities, specific medications, and environmental or operational factors, such as clinical unit type and staff to patient ratios, were not frequently used within the clinical prediction models identified in this scoping review and remain relatively underexplored.
It is interesting to note that many of the included risk factors may be associated both with other risk factors and with the falls themselves, but mechanisms of action and causality are not always easy to discern or remediate. For example, prolonged bed rest is associated with falls [89], as well as cognitive decline [90] and muscle weakness [91]. Yet, the direction of causality or mechanism of effect may not necessarily be clear or even unidirectional. Independently, cognition may be associated with falls due to increased risk-taking and impulsivity [92], and similarly, gait, balance, and strength abnormalities may independently contribute to falls [93]. However, each of the above traits may also be caused by an underlying systemic health condition that precipitated the need for hospitalization independent of any interactions between the aforementioned factors predictive of inpatient falls. While a detailed exposition of potential causal and non-causal interactions between predictors of hospital falls is beyond the scope of the present study, it remains an important consideration for future research in the field with relevance to the development and refinement of hospital fall prevention interventions including integrated fall prevention computerized clinical decision support systems.
It is likely that the performance of models developed for predicting falls is dependent on the methods used, including sample size, the algorithm(s) used, or those that balance, augment, or impute data. Unfortunately, the substantial homogeneity in model development processes among the included studies and the generally poor quality of model validation used to estimate performance meant that we were not able to infer generalizable methodological lessons from quantitative comparisons of model performance across the included studies. However, a recent simulation study suggests that methods to reduce class imbalance, including random under- and oversampling, may hinder model calibration and give no benefit to model discrimination [94]. Performing these comparisons using real-world data remains a worthwhile future research opportunity that will have implications for the field of hospital fall prevention as well as potential relevance to predictive model development in other fields of patient safety and hospital-acquired complications.
Limitations of This Review
There are several noteworthy limitations associated with the generalizability of findings arising from this review. First, the scope of included studies was limited to the development of approaches to predict fall risk in the hospital setting and conclusions from this review should not be generalized beyond this context. Second, this review did not focus on pragmatic considerations related to the post-development operationalization of fall prediction models for use in hospital settings, including digital (or analogue) infrastructure, governance, clinical workflow, or other human factor considerations. Third, while this review has successfully explored the a-priori research questions, it was not within the scope of this review to examine the comparative efficacy of fall prediction models in the context of their application for triggering fall prevention interventions initiated in response to identified fall risk.
Model Development Processes Should Be Optimized for Clinical Use
Perhaps the most pertinent implication for future research arising from this review is the substantial opportunity that remains to improve the extent to which intended end-use is considered in hospital fall risk model development processes. For example, a prediction made soon after admission for whether a patient will fall within their hospitalization may not reflect patient fall risk on any given day or particular clinical shift for a hospital staff member interpreting fall risk alerts, but rather, fall risk over the patient’s entire length of stay. It is likely that fall risk may change over patients’ hospitalization and inpatient recovery period. Fall prevention efforts may be best conceptualized as part of high-quality comprehensive patient care that is responsive to dynamic emergent clinical information, including dynamic treatment of fall risk within an admission. Considering this intended end-use during approaches for developing hospital fall risk prediction may help inform not only isolated fall risk prediction, but its potential for integration into computerized clinical decision support systems that may facilitate responsive risk-mitigating actions from treating clinical teams to prevent harm from falls.
Another important implication for future research is the potential impact of correct and incorrect fall risk classification. Only one study in this review [43] considered the performance of their model in terms of the relative costs of misclassification in the context of inpatient falls specifically. Although the values chosen for their 2 × 2 cost-matrix were not selected directly from estimates of applied intervention costs and effectiveness on improving health outcomes, this approach is likely to be informative for future model development as a means of optimizing health outcomes associated with hospital fall prevention resource allocation. Similarly, fall risk prediction models solely relying on data routinely recorded within digital hospital systems, particularly electronic medical records, may yield improvements in workflow efficiency by reducing the burden on clinical staff. This review identified that 24 of the 63 included studies used data, at least in part, from electronic medical records to predict fall risk. This may be particularly beneficial if digital automation of fall risk prediction enabled nurses within clinical teams to divest their time away from manually completing data acquisition for traditional fall risk assessment tools to high-priority patient care activities.
Statement of Ethics
An ethics statement is not applicable because this study is based exclusively on published literature.
Conflict of Interest Statement
The authors have no conflicts of interest to declare.
Funding Sources
This work was supported by funding from the Digital Health Cooperative Research Centre (DHCRC-0058). Steven M. McPhail is supported by a National Health and Medical Research Council (NHMRC) administered fellowship (#1181138). Funders had no role in the conduct of the study, preparation of the manuscript, or decisions related to publication.
Author Contributions
Rex Parsons, Steven M. McPhail, and Susanna M. Cramb conceived the review. Rex Parsons searched for articles and Rex Parsons and Robin D. Blythe screened them. Susanna M. Cramb resolved conflicts for article screening. Rex Parsons and Robin D. Blythe piloted data extraction. Rex Parsons extracted and summarized the data and wrote the first draft of the manuscript. Steven M. McPhail and Susanna M. Cramb supervised the review process and all the authors prepared the final draft for submission. All the authors contributed to the interpretation of results, manuscript preparation, and revisions. All the authors read and approved the final manuscript.
Data Availability Statement
The authors declare that the data supporting the findings of this review are available within the paper.