Abstract
The field of public health genomics has matured in the past two decades and is beginning to deliver genomic-based interventions for health and health care. In the past few years, the terms precision medicine and precision public health have been used to include information from multiple fields measuring biomarkers as well as environmental and other variables to provide tailored interventions. In the context of public health, “precision” implies delivering the right intervention to the right population at the right time, with the goal of improving health for all. In addition to genomics, precision public health can be driven by “big data” as identified by volume, variety, and variability in biomedical, sociodemographic, environmental, geographic, and other information. Most current big data applications in health are in elucidating pathobiology and tailored drug discovery. We explore how big data and predictive analytics can contribute to precision public health by improving public health surveillance and assessment, and efforts to promote uptake of evidence-based interventions, by including more extensive information related to place, person, and time. We use selected examples drawn from child health, cardiovascular disease, and cancer to illustrate the promises of precision public health, as well as current methodologic and analytic challenges to big data to fulfill these promises.
Introduction
The term “precision” is increasingly used in medicine [1] and public health [2]. Precision medicine is often used synonymously with genomic medicine, and precision public health has been equated with applications of precision medicine in populations [3]. Applications of precision medicine (e.g., cancer genomics) are unlikely to lead to improved population health, as targeted interventions benefit only a small subset of the population. Nevertheless, as we and others have discussed, there is a bigger role for precision in public health beyond genomics [4]. Precision public health can be viewed as the delivery of the right intervention to the right population at the right time, and includes consideration of social and environmental determinants of health [4]. As recently stated by Richard Horton, “precision public health offers a compelling opportunity to reinvigorate a discipline that has never been more important for advancing the health of our most vulnerable and excluded communities. Precision public health is about using the best available data to target more effectively and efficiently interventions of all kinds to those most in need” [5].
Increasingly, a large volume of health- and non-health-related data from multiple sources is becoming available that has the potential to drive precision implementation. The term “big data” is often used as a buzzword to refer to large data sets that require new approaches to manipulation, analysis, interpretation, and integration [6]. Such data include genomic and other biomarkers, as well as sociodemographic, environmental, geographic, and other information. Our ability to improve population health depends to a large extent on collecting and analyzing the best available population-level data on burden and causes of disease distribution, as well as on the level of uptake of evidence-based interventions that can improve health for all [6].
In this commentary, we ask how the emerging abundance of data and the associated predictive analytics can contribute to precision public health by including more extensive information in the public health assessment of disease burden, as well as facilitators and barriers to evidence-based intervention implementation and outcome measures, as related to person, place, and time (see Table 1 for definitions and examples).
Can Big Data Help Better Characterize Population Health Outcomes and Implementation Needs and Disparities in Health and Health Care?
Public health and implementation scientists explore strategies for improving uptake of evidence-based health interventions that target multiple levels (from patient/person level to provider, system, community, and policy interventions) [7]. Implementation gaps and health outcomes are usually measured with limited data by place, person, and time.
More Precision Assessment by Place
The use of big data sources could allow a more in-depth analysis of disease burden and implementation gaps and disparities in health care systems and population subgroups. For example, using small-area analysis, we might be able to uncover pockets of disparities in the implementation of health interventions that are often masked in analyses performed on areas such as counties or states. A case in point is the recent local-burden-of-disease analysis of child mortality under the age of 5 years across 46 African countries [8]. When mortality was analyzed at a high spatial resolution, new maps showed major disparities in child mortality even though progress in the implementation of evidence-based interventions was reported at the country level. Similarly, neighborhood deprivation metrics can assess disparities and implementation gaps within regions (e.g., wealthy towns may have micropockets of deprivation [9]). More “precision” in geographic, community, and health system analysis can pinpoint how best to target interventions to reduce morbidity in difficult-to-reach subpopulations and help reduce disparities.
More Precision Assessment by Person
Similarly, in characterizing gaps and disparities in implementation and outcomes, personal characteristics of patients, providers, and policy-makers can be further refined beyond the use of traditional indicators such as age, gender, and race/ethnicity. Genomic and other biomarkers can stratify disease outcomes and susceptibility into subgroups that reflect the underlying disease heterogeneity and potential response to different types of interventions [1].
For example, while implementing cholesterol education campaigns at the population level, individuals with familial hypercholesterolemia (FH), a genetic disorder affecting 1 in 250 people, remain largely undiagnosed as they require more intense identification, high-intensity LDL-lowering drugs, and cascade screening in families. Failure to ascertain this high-risk population subgroup of a million or more individuals in the USA will lead to undertreatment and worse health outcomes [10], particularly as we miss the opportunity to implement evidence-based care for this subgroup.
Another example is colorectal cancer screening. While evidence-based guidelines recommend colorectal cancer screening after the age of 50 years in the “average” population, such guidelines will miss the 1 in 280 individuals with Lynch syndrome who has an increased risk of colorectal cancer and will require screening much earlier; it requires a more targeted approach to find these genetically high-risk individuals in the population and implement effective screening, follow-up, and referral to care [11].
More Precision Assessment by Time
Big data may also improve precision through analysis of repeated measurements of the same variables over time. The use of personal devices such as sensors, smartphones, and other digital devices [12] can provide measurement of variability over time for various health indicators such as nutrition, physical activity, and blood pressure. Most surveys rely on infrequent or cross-sectional measurements of these and other health indicators. Smartphones are used increasingly to deliver evidence-based interventions (e.g., diet/nutrition programs, psychotherapy, and exercise). The data collected through digital devices give a picture of how the interventions have been implemented and the outcomes generated with much greater precision.
Can Big Data Inform Next-Generation Implementation Studies?
Big-data-driven public health assessment studies provide directions about how to enhance implementation in subpopulations and can drive implementation studies that tailor interventions by place, person, and time.
More Precision Implementation by Place
Implementation studies evaluate delivery of interventions in real-world contexts of health care delivery systems and communities, with the goal of delivering interventions optimally across populations. The use of machine learning and decision support tools [13] adapted to specific health care delivery systems could enhance the implementation of evidence-based guidelines.
As discussed by Engelgau et al. [14], tools of predictive analytics and big data can help identify major challenges for implementation, including the identification of key barriers and facilitators within the socioecological context, various health and community policies, delivery strategies within health systems (e.g., physical infrastructure and availability of interventions), and community contexts (e.g., community resources, social deprivation, and economic issues). Predictive analytics based on big data offer new approaches to pinpoint key barriers and facilitators across the community context and lend insights into promising implementation strategies [14].
More Precision Implementation by Person
In order to reach subpopulations with unique health conditions, targeted intervention strategies will be needed. For example, the FH Foundation (https://thefhfoundation.org/) has developed a decision support tool using a machine learning algorithm based on structured and unstructured data to help identify individuals with probable FH within electronic health records, large-scale laboratories, and claims databases [15]. The idea behind this tool, and other similar ones, is to aid clinicians to find more FH patients in selected populations, most of whom are currently undiagnosed and undertreated. Finding FH patients will also allow better implementation of recommended cascade screening in families.
More Precision Implementation by Time
Smartphone apps can use big data to allow real-world collection and analysis over time for many evidence-based interventions (e.g., testing of adherence to medication use and longer-term measuring of outcomes over time). Apps could serve as a microcosm of a learning system that collects data on person, place, and time and could use the patterns detected to adjust an intervention based on its overall pattern of use and effectiveness.
For example, in a recent paper [16], a randomized clinical trial of 411 adults with poorly controlled hypertension showed that patients receiving a smartphone app used repeated measures to show a small improvement in reported adherence to medication use. This pilot study points to the need to evaluate more broadly the effects of mobile health interventions on implementation processes and clinical outcomes in order to understand the context of why some interventions are successfully implemented while others are not. Big data drawn from social media platforms are also increasingly used in the context of public health emergencies to enhance public health surveillance, detection of disease clusters, and facilitation of communication and behavior change to accelerate disease prevention and control. Dunn et al. [16] recently reviewed the progress and promise of social media interventions that could enhance precision public health. They also cautioned about how little we know about the health impact of such interventions and the risks of unintended consequences.
There are immediate opportunities to explore how big data can influence the next generation of implementation science. For example, the National Cancer Institute recently put out a funding announcement to create a set of Implementation Science Centers for Cancer Control [17]. This program will support the rapid development and testing of innovative approaches to implement evidence-based cancer control interventions; establish implementation laboratories in clinical and community sites; advance methods in studying implementation; and develop reliable implementation measures [18]. Implementation science centers will be an ideal forum for using big data to accelerate implementation science in cancer control. These centers are expected to use multiple sources of data and develop implementation and outcome measures using a multilevel approach, including individual, family, school, workplace, social network, and community, as well as natural, built, economic, policy, institutional, and health care environments. Data will be integrated and captured across observational, self-report, objective, and/or real-time data to identify synergistic, complementary, and interacting implementation research strategies.
Similarly, the National Heart, Lung, and Blood Institute is supporting implementation science to help turn discoveries into improved population health. This will require the collaboration of diverse stakeholders to leverage “expertise from biomedical, behavioral, and social science research, as well as experts from engineering, bio-informatics, behavior economics, and the emerging field of big data science” [19].
The Crucial Role and Current Challenges of Machine Learning and Predictive Analytics
To maximize the benefits of big data in precision public health, robust analytic methods are needed for individual studies and to synthesize information across studies [20]. Machine learning and predictive analytic tools are increasingly used in health care and population health settings to make sense of the large amount of data, both for assessment and implementation purposes [21]. In principle, predictive analytics can provide novel approaches to analyze disease prediction and forecasting models and to pinpoint key barriers and facilitators to the delivery of interventions that were proven to be effective. The field of oncology provides a salient application of big data analytics, i.e., how to combine data from multiple sources (DNA germ line and tumor sequencing, gene expression, epigenetics, proteomics, etc.) along with individual and population-level variables in order to arrive at optimal and individualized intervention strategies for both treatment and prevention. Similar complex prediction models have been developed for heart disease prediction. For the most part, however, big data predictive analytics have not provided better quantitative risk prediction models than classic statistical methods such as logistic regression analysis [22].
There are numerous gaps and methodologic limitations that need to be overcome before big data can fulfill the promise of precision public health [21]. Issues involving data inaccuracy, missing data, and selective measurement are substantial concerns that can potentially affect predictive modeling results and decision-making. In addition, deficiencies in model calibration can interfere with inferences. For example, patients may see multiple health care providers who use different health records in different health care delivery systems. Often, data are not shared across platforms, or are incomplete. Coding for health care billing also can vary from one system to another, and health records’ completeness can vary. This can potentially create biases in effect estimation and prediction modeling. Prediction models derived from one population may not be generalizable to other settings. Furthermore, methodologic deficiencies such as systematic bias in prediction models and nonrepresentative studies, along with limited or differential access in subpopulations, can contribute to widening of health disparities, especially for racial and ethnic minority populations. For example, recent studies have consistently shown that the accuracy of genetic risk prediction models based on genome-wide association studies is reduced among non-European populations compared to European populations [23]. This is due to the fact that most genome-wide association studies have been conducted in populations of European descent. The ultimate goal of big data analytics is to improve decision-making both in the clinical and in the population setting. Providing outcome probabilities may or may not change physician, patient, or health system behavior. We need to evaluate the balance between benefits and harms of specific implementation interventions (clinical utility).
As discussed in a recent special issue [24] on machine learning, predictive analytics need to have a clear purpose. Ideally, the development and validation of prediction models should strive for external validation. The current literature contains studies on machine learning approaches that have undergone retrospective testing but not prospective evaluation. Descriptive studies using existing data can identify barriers and facilitators to implementation. Interventions to address barriers are then integrated into prospective studies. As a result, the current applications of machine learning to health care systems remain severely limited [25]. These limits also apply to public health activities that are concerned with implementation challenges and health outcomes in whole populations and subgroups that are outside the health care delivery system. This issue is especially relevant in dealing with global health data, and a large effort goes into knitting together disparate and noisy information to describe health outcomes and implementation challenges globally [25]. Severe constraints on resources dictate the need to develop and evaluate alternative solutions. Issues of fairness, accountability, transparency, and privacy-preserving methods, as well as the anticipation of deleterious effects, are essential considerations for ensuring that the promise of big data to improve health and reduce health disparities does not lead to unintended and opposite effects [25].
Conclusions
In the age of genomics and big data, more extensive information by place, person, and time is becoming available to measure the public health impact and implementation needs. Using a few examples, we have shown how such data may provide more information derived from public health assessment studies and next-generation implementation studies. In principle, big data could point to implementation gaps and disparities and accelerate the evaluation of implementation strategies to reach those population groups most in need of interventions.
However, major challenges need to be overcome. For precision public health to succeed, further advances in predictive analytics, and practical tools for data integration and visualization, need to be made. As most public health and implementation scientists are not well versed in big data science, it will be crucial that robust training and career development be offered at the intersection of big data and public health. This research/training agenda will help turn the promise of big data into effective precision implementation strategies to maximize the benefits of evidence-based interventions to improve population health.
Acknowledgements
This paper was supported by the authors’ salaries from the Centers for Disease Control and Prevention and the National Institutes for Health.
Statement of Ethics
The authors have no ethical conflicts to disclose.
Disclosure Statement
The opinions expressed in this paper are those of the authors and do not reflect the official positions of the Centers for Disease Control and Prevention or the National Institutes for Health. The authors have no conflicts of interest to disclose.