Over the last several years, there has been rapid growth of digital technologies attempting to transform healthcare. Unique features of digital medicine technology lead to both challenges and opportunities for testing and validation. Yet little guidance exists to help a health system decide whether to undertake a pilot test of new technology, move right to full-scale adoption or start somewhere in between. To navigate this complexity, this paper proposes an algorithm to help choose the best path toward validation and adoption. Special attention is paid to considering whether the needs of patients with limited digital skills, equipment (e.g., smartphones) and connectivity (e.g., data plans) have been considered in technology development and deployment. The algorithm reflects the collective experience of 20+ health systems and academic institutions that have established the Network of Digital Evidence for Health, NODE.Health, plus insights from existing clinical research taxonomies, syntheses or frameworks for assessing technology or for reporting clinical trials.
Over the last several years, there has been rapid growth of digital technologies aimed at transforming healthcare [1, 2]. Such innovations include patient-focused technologies (e.g., wearable activity tracking devices), other connected devices (glucometers, blood pressure cuffs, etc.) as well as the broader category of analytic applications used by health systems to monitor disease status, predict risk, make treatment decisions or monitor administrative performance. The emergence of nontraditional players in healthcare (e.g., Amazon and Google) [3, 4] is creating tremendous pressure on health systems to innovate. Yet little guidance exists to help a health system decide whether to undertake a pilot test of new technology, move right to full-scale adoption or somewhere in between .
Unique features of digital medicine technology lead to both challenges and opportunities for testing and validation. To navigate this complexity, this paper proposes an algorithm (Fig. 1) to help health systems determine the appropriate pathway to digital medicine validation or adoption. The algorithm was developed based on the collective experiences of 20+ health systems and academic institutions who have established the Network of Digital Evidence for Health, NODE.Health. In addition, evidence-based clinical research taxonomies, syntheses, or frameworks for assessing technology or for improving reporting of clinical trials identified from a literature review framed our work. Informed by Realist Synthesis methods [6, 7], the Immersion/Crystallization method  was used to understand how a product works by looking at the relationships among the product, the context, the user population, and outcomes. This approach is especially relevant to digital technology where complex interventions are continuously updated to reflect the user and context . Concepts emerging from our literature review of contextual issues include classification of users[10‒12], use of technology for collecting outcomes , and factors associated with scale-up [14, 15]. Concepts related to understanding outcomes were found in the literature related to methods of usability testing [16, 17], research design [18‒22], product evaluation or efficacy testing [23‒26], and reporting of research [24, 27‒30]. To better understand the mechanism of action, we considered classification of technology [31‒35], patient engagement or technology adoption [36‒42], and reasons for nonuse of technology .
To make effective use of the algorithm, some key features of digital medicine products and their development ecosystem are discussed below. The user-task-context (eUTC) framework provides a lens for considering choices or plans for technology testing and adoption . First, the institution should define its use case(s) for the product – know the target population (users), the required functions (tasks), and understand what is required for successful adoption within a specific setting. In assessing the fit of a technology under consideration, one should consider whether a product has been shown to work in the target user population. User dimensions especially important to health technology adoption include age, language, education, technology skills, and access to underlying necessary equipment and connectivity (e.g., smartphones or computers and broadband or mobile data). Similarly, one must ensure that a technology successfully used in one setting was used for the same tasks as are needed in the target situation. For example, reports may show that a program performed well with scheduling in a comparable setting but that no evidence was provided about its use for referrals, the task required by the target setting. Finally, numerous dimensions of the setting could affect the likelihood of success; consider, for example, the physical location, the urgency of the situation  and the social context (e.g., whether researchers or clinicians are involved, and their motivations for doing so) .
The potential for a misalignment of goals and incentives of institutional customers and technology developers and vendors is a dimension of the social context that is unique to digital health products. For example, a developer may encourage pilot studies as a prelude to making a lucrative sale but ultimately be unable to execute a deal because the pilot testers lack the authority or oversight to commit to an enterprise-wide purchase . Or, health systems might want to test modifications of existing solutions to increase impact, but the needed updates may not fit with the entrepreneur’s business plan. Finally, there must be adequate alignment between the institution and the product developer in terms of project goals, timeline, budget, capital requirements, and technical support.
Regulatory Perspectives Associated with Digital Medicine Testing
The traditional clinical testing paradigm does not neatly apply to digital medicine because many digital solutions do not require regulatory oversight . Mobile applications that pose a risk to patient safety , make medical claims, or are seeking medical grade certification  invoke Food and Drug Administration (FDA) oversight. For others, the agency reserves “enforcement discretion,” meaning it retains the right to regulate low-risk applications  but allows such products to be sold without explicit approval. Senior US FDA officials acknowledge that the current regulatory framework “is not well suited for software-based technologies, including mobile apps….” . The agency is gradually issuing guidance to address mobile health applications that fall in the gray area. For example, the FDA recently stated that clinician-facing clinical decision support will not require regulation, but that applications with patient-facing decision support tools may be regulated . In light of the shifting landscape and gaps in regulatory oversight of digital health application, we review some unique features of digital health technologies and then offer an adaptation of the traditional clinical trial phases to provide a common language for discussing testing of digital medicine products.
Digital Medicine Product Characteristics
Digital health tools have some inherent advantages over traditional pharmaceutical and device solutions in that they may be commercially available at low cost, can undergo rapid iteration cycles, be rapidly disseminated through cell phones or the internet, can collect data on an ongoing rather than episodic basis, and can make data available in real time . Digital tools can be highly adaptive for different populations, settings, and disease conditions. However, these very features create challenges for testing because digital tools can be used differently by different people, especially in the absence of training and standardization .
Digital Medicine Development Ecosystem
Much digital medicine technology is developed in the private sector where secrecy and speed trump the traditional norms of transparency and scientific rigor, especially in the absence of regulatory oversight. The ability to rapidly modify digital medicine software is essential for early-stage technology, but if algorithm updates are unbeknownst to evaluators, they can upend a study. On the other hand, in light of the average 7-year process to develop, test and publish results from a new technology, failure to update software could lead to obtaining results on an obsolete product [21, 53]. Digital products based on commercial off-the-shelf software are especial likely to undergo software changes versus digital solutions that are being built exclusively for a designated setting. Thus, it is important to understand where a technology is in its life cycle as well as to know whether the testers have a relationship with the developers whereby such background software changes would be disclosed or scheduled around the needs of the trial.
The type of company developing the digital technology can also affect testing dynamics. Start-up companies may have little research experience and limited capital to support operations at scale. Giants in the technology, consumer, transportation or communication sectors that are developing digital health tools may have the ability to conduct research at a massive scale [54, 55] (e.g., A/B testing by a social media company) but without the transparency typical of the established scientific method. Both types of companies may lack experience working with the health sector and thus be unfamiliar with bifurcated clinical and administrative leadership.
In light of the many unique dimensions of digital health technology, Table 1 presents questions to consider when initiating or progressing with a phased trial process. These questions have been informed by the experience of NODE.Health in validating digital medicine technologies [47, 56‒59], and by industry-sponsored surveys of digital health founders, health systems leaders, or technology developers [60, 61]; or other industry commentary [46, 62].
Testing Drugs and Devices versus Digital Solutions
Table 2 proposes a pathway for testing digital health products that addresses their unique characteristics and dynamics. The testing pathway used for regulated products is shown on the left with parallel digital testing phases shown on the right half of Table 2.
Preclinical studies, often performed in animals, are meant to explore the safety of a drug or devices prior to administration in humans . The digital medicine analog is either absent or perhaps limited to determination of whether or not a product uses regulated technology such as cell phones that emit radiation.
Traditional and digital phase I studies may be the first studies of a drug or a technology in humans, are generally nonrandomized, and typically include a small number of healthy volunteers, or individuals with a defined condition. Whereas the purpose of traditional phase I studies is to test the agent’s safety and find the highest dose that can be tolerated by humans , the digital equivalent usually begins with a prototype product that may not be fully functioning. These early digital phase I studies increasingly use “design thinking” [16, 64] to identify relevant use cases and rapidly prototype solutions. Through focus group discussions, interviews, and surveys, developers find out what features people with the target condition want and conduct usability testing to see how initial designs work.
Traditional phase II studies look at how a drug works in the body of individuals with the disease or condition of interest. By using historical controls or randomization, investigators can get a preliminary indication of a drug’s effectiveness so that endpoints and the sample size needed for an efficacy trial can be identified . These techniques may require that study volunteers be homogenous, limiting understanding of product use among the full range of target product users. The digital analogue could be a feasibility study in which an application is tested with individuals or in settings reflecting the targeted end users. Because nonregulated digital products are by definition at low or no risk of harm, phase II studies of digital products may involve larger numbers of subjects than traditional trials.
Phase III trials of traditional and digital products are typically randomized studies to determine the efficacy of the drug or digital intervention. Ideally, volunteers reflect the full range of individuals with the condition, but, at least with therapeutic and device trials, the study population is likely to be homogeneous with respect to variation in disease severity, comorbidities and age to control main sources of bias and maximize the ability to observe a significant difference between the treatment and control arms . Digital studies may have fewer eligibility criteria resulting in studies that could be more broadly generalizable.
Because traditional efficacy trials are not likely large enough to detect rare adverse events, the FDA may require postmarketing (phase IV) studies to detect such events once a drug or device is in widespread use . For low-risk applications, postmarketing surveillance studies may be unnecessary. Instead, we propose a class of studies that apply implementation science methods such as rapid cycle testing to generate new knowledge about technology dissemination beyond the efficacy testing environment and identification of unintended consequences in specific real-world settings. The trial may involve assessing requirements needed to integrate the novel product with existing technology, determine staffing requirements, training, and workflow changes. In this category, we also include strategies that allow for testing multiple questions in a short period of time, such as “n of one” and factorial studies . Again, the Realist Synthesis perspective points to using these studies to best understand how and why a product does or does not work in specific settings .
Pragmatic trials are a next step common to both traditional and digital interventions; they seek to understand effectiveness in a real-world setting and with diverse populations [22, 66]. Pragmatic studies of digital technology could be undertaken completely outside of the context of healthcare settings such as with studies that recruit participants and deliver interventions through social media , that deliver interventions and collect data from biometric sensors and smartphone apps , or that market health-related services such as genetic testing directly to consumers . The FDA’s move toward the use of real-world evidence and real-world data suggests that digital health tools used by consumers may assume greater importance for testing drugs and devices  as well as nonregulated products.
Focus on External Validity
Many digital solutions can be widely adopted because they lack regulatory oversight and are commercially available to consumers. However, the specter of widespread utilization may magnify shortcomings of a testing process that may not have been conducted adequately in diverse settings and/or with desired groups of end users. The RE-AIM framework has been used since the late 1990s to evaluate the extent to which an intervention reaches, and is representative of, the affected population . More recently, the framework has been used to plan and implement research studies  and to assess adoption of community-based healthcare initiatives . Therefore, application of the RE-AIM framework to digital medicine testing can be especially valuable for those seeking to understand the likelihood that a specific digital health solution will work in a specific setting. Table 3 summarizes the original RE-AIM domains and then applies them to illuminate reliability and validity issues that could arise in testing digital medicine technologies.
The RE-AIM Dimensions and Key Questions
“Reach” concerns the extent to which populations involved in the testing reflect populations intended to use the product. “Effectiveness” concerns whether the product was effective overall or in subgroups typically defined by health status or age. Based on standards that have been articulated to improve the reporting of web-based and mobile health interventions  and reporting of health equity , we extend the traditional Reach and Effectiveness dimensions to highlight specific populations in whom digital medicine technologies might exhibit heterogeneity of effect due to “social disadvantage.” Traditional social determinants of health (education, race, income) plus age are highly associated with access to and use of the internet and smartphones. Digital skills are, in turn, highly associated with having and using computers and smartphones, as such skills develop from use , and usage grows with access to equipment and connectivity . Digital skills, computers or smartphone, and mobile data or fixed broadband are essential for almost all consumer-facing health technology such as remote monitors and activity tracking devices. Given the covariance of digital skills and access with age, education, and income [75‒80], and the likelihood that those with low digital skills may resist adoption of a digital technology or may have challenges using it [56, 81‒84], it is especially important to determine whether such individuals were included in prior testing of a product and whether prior studies examined heterogeneity in response by such categories.
For products that have proven effectiveness, “Adoption” concerns the incorporation of the intervention into a given context after pilot testing. Stellefson’s application of the RE-AIM model to assessing the use of Web 2.0 interventions for chronic disease self-management in older adults is especially instructive for assessing how a healthcare institution moves to Adoption after a trial . To what extent does the current setting replicate the context of previous trials? If a product was tested in an in-patient setting but is being considered for use in an ambulatory setting, for example, it is vital to understand who would need to be involved in a decision to move to adoption after efficacy testing. Do current staff have the skills and authority needed to integrate new technology? Where will funds for the technology adoption project come from?
“Implementation” refers to the types of support needed for the intervention to be administered and how well it was accepted by end users. “Maintenance” addresses the sustainability of the intervention in a setting and the duration of effect in the individual user.
We next apply the RE-AIM framework to help health systems assess evidence gaps and identify the most appropriate testing to use in a specific situation. For a product at an early stage of development, as shown in Table 4, questions of “Reach” are most relevant. Since early testing focuses on how well a device or application functions, there may be limited information available about how well the device functions among individuals or settings similar to those under consideration. For a technology that has completed an efficacy trial, it is important to understand the extent to which the test results would apply in different populations and settings. It is also crucial to understand features of the study such as whether careful instruction and close monitoring were necessary for the technology to function optimally. For a technology with ample evidence of efficacy in relevant populations and settings, those considering moving to widespread adoption must understand institutional factors associated with successful deployment such as engagement of marketing and IT departments to integrate data systems.
The Digital Health Testing Algorithm
The digital health testing algorithm (Fig. 1) presents a series of questions to help a health system determine what sort of testing, if any, may be needed when considering adoption of a new technology.
Example 1: Care Coordination, Application, Validation
A healthcare system was approached by a start-up company that had designed a new care coordination application that uses artificial intelligence to predict readmission risk for a particular condition. The tool simplifies the clinical workflow by pulling multiple data sources together into a single dashboard. For this first attempt at use in a clinical setting, an ambulatory care practice was asked to check the electronic medical record to validate that the product correctly aligned data from multiple systems. With that validation step complete and successful, the start-up company would like to embark on a small prospective phase II study to validate the accuracy of the prediction model. Is that the right next step?
The initial study was a phase I study that showed that the data mapping process worked correctly using data pertinent to an ambulatory care setting. Further phase I testing should be done to ensure that the data mapping process works more broadly, such as with in-patient data. Then, to validate the algorithm, the company could apply the algorithm to historic data and see how well it predicts known outcomes. This would still be part of phase I, simply ensuring that the product works as intended. Once the key functions are confirmed to function, then additional qualitative studies should be undertaken to ensure that the product meets the needs of users. That process should lead to additional customization, based on real-time healthcare system feedback. Once that process is complete, they can move to phase II testing. For the first actual use in the intended clinical environment, the health system leadership might identify a local clinical champion who would be thoughtful and astute about how to integrate the product into the clinical workflow. Health system Information Technology staff should be closely involved to ensure a smooth flow of data for the initial deployment. From this stage, utilization data should be examined closely, and staff should be debriefed to determine whether it was used. Some initial signals that the product produces the desired outcome should be seen before proceeding to a phase III efficacy trial.
Example 2: Patient-Facing Mobile Application to Improve Chronic Disease Tracking and Management
An independent medical practice is approached by a start-up company that has developed a mobile application to simplify management of a particular condition. As no clinical advice is involved, the application is not regulated by the FDA. The product has been used for 1 year in three other small- to medium-size practices. Feedback from the practices and patients was used to tweak the product. A pilot study with the improved product showed that satisfaction and utilization were very high among patients who were able to download the app onto their phones. Hospitalizations were lower and disease control better among patients who used the application compared with nonusers, but the results did not reach statistical significance. The start-up company has a Spanish language version of the product that was developed with input from native Spanish-speaking employees at the start-up company. They would like to conduct a phase III efficacy trial with a randomized design, using both English and Spanish language versions of the tool. Is that the right next step?
This product has completed phase II testing that showed efficacy signals among those who used the tool heavily. Before proceeding to an efficacy test in a comparable population, one must look carefully to see how patients who used the product differed from those who used it only a bit or not at all. If users are healthier and better educated, they may have been better able to manage their conditions even without the application. Qualitative assessments from staff and patients should be undertaken to ascertain why others did not use the product. If new functions are needed, they should be developed and tested in phase I studies. If operational barriers can be identified and addressed, rapid cycle testing could be undertaken, and then a repeat phase II study with the population that initially failed to use the product. In considering initiation of a trial in the new clinic, phase I studies must be undertaken to ensure that the product meets the needs of very different populations – including lower income, non-English-speaking, and patients with only rudimentary smartphones and limited skill in using apps. Then, a phase II study should be initiated to see whether the target population will use the app and whether efficacy signals are seen.
Example 3: Scale-Up of Disease Management System
A mid-size faith-based community healthcare system serving a largely at-risk population has implemented a disease-specific digital chronic care management and patient engagement program at two of its larger facilities. The deployment went well, and data strongly show that the application has positive outcomes. The system would like to roll the product out at all 20 of its facilities, using a pragmatic design to capture data on the efficacy of the product in real-world use. Should they move straight to a pragmatic trial?
In this case, 4 of the 20 additional clinics have newly joined this health system; various administrative systems are still being transitioned. If the health system has an excellent understanding of the contextual factors associated with successful implementation, then they may proceed with a pragmatic trial. For example, they may see that significant additional staffing is required to manage patient calls and messages that come through the app. They might then incorporate a call center to efficiently manage patient interactions at scale. However, if such factors are unknown, they should undertake rapid cycle testing, perhaps deploying the system in just one of the new clinics, and using qualitative research methods to understand factors associated with successful adoption. The focus of the rapid cycle tests and the pragmatic trial is on testing methods of scaling the product, rather than on testing the product itself.
Digital solutions developed by technology companies and start-ups pose unique challenges for healthcare settings due to potential for lack of alignment of goals between a health system and a technology company, the potential for unmeasured heterogeneity of effectiveness, and the need to understand institutional factors that may be crucial for successful adoption. Taxonomies and frameworks from public health and from efforts to improve the quality of clinical research publications lead to a set of questions that can be used to assess existing data and chose a testing pathway designed to ensure that products will be effective with the target populations and in target settings.
Some real-world examples [86, 87] illustrate the need for attention to the fit of a technology to the needed tasks, the user population, and the context of adoption.
Thies et al.  describe a “failed effectiveness trial” of a commercially available mHealth app designed to improve clinical outcomes for adult patients with uncontrolled diabetes and/or hypertension. Although tested very successfully in “the world’s leading diabetes research and clinical care organization” and at one of the nation’s leading teaching and research hospitals , use in a Federally Qualified Health Center failed for reasons clearly related to the “context” and the “users.” Patients reported a lack of interest in the app as “getting their chronic condition in better control did not rise to the level of urgency” for patients that it held with the clinic staff. Lack of access to and comfort with technology “may have reduced the usability of the app for patients who enrolled and dissuaded patients who were not interested.” Contextual factors included difficulty downloading the app due to limited internet access in the clinic, lack of clinic staff time to explain app use to patients, and lack of integration into clinic workflow since the app was not connected with the EHR .
In a pragmatic trial, 394 primary care patients living in West Philadelphia were called 2 days before their appointments with offers of Lyft rides. There were no differences in missed visits or in 7-day ED visits for those offered rides compared with those whose appointments were on different days where rides were not offered. Investigators had only been able to reach 73% of patients on the phone to offer rides, of whom only 36.1% were interested in receiving the offered Lyft ride. Hundreds of health systems have now deployed ride sharing services, but evidence of the impact has not been reported . A considerable body of qualitative research may be needed to determine the best ways to deploy a resource that makes great sense intuitively but may not be received by users as envisioned.
Although far from complete, the proposed algorithm attempts to provide some structure and common terminology to help health systems efficiently and effectively test and adopt digital health solutions. In addition, a framework analogous to clinical trials.gov is needed to track and learn from the evaluation of digital health solutions that do not require FDA oversight.
The authors would like to thank Hannah Johnson, Kellie Breuning, Stephanie Muci, Rishab Shah, Connor Mauriello and Kelsey Krach for research assistance, and Nitin Vaswani, MD, MBA, the NODE.Health Program Director for helpful comments. NODE.Health is a consortium of healthcare systems and academic institutions that aims to build and promote evidence for digital medicine and care transformation.
Brian Van Winkle is Executive Director of NODE.Health. Amy R. Sheon and Yauheni Solad are volunteer members of the NODE.Health Board of Directors. Ashish Atreja is the founder of NODE.Health.
Amy R. Sheon’s work was supported by the Clinical and Translational Science Collaborative of Cleveland (CTSC) under grant number UL1TR000439 from the National Center for Advancing Translational Sciences (NCATS) of the National Institutes of Health and NIH roadmap for Medical Research. Its contents are solely the responsibility of the authors and do not necessarily represent the official views of the NIH. NODE.Health receives industry funding to evaluate the clinical effectiveness of digital solutions. NODE.Health is also developing a registry to track the reporting of digital health solutions to encourage greater transparency and better knowledge.
A.R.S. is the lead author and B.W., Y.S., and A.A. are co-authors.