Introduction
We are in the midst of a global opioid epidemic, which is driven by the misuse of both prescription and illicit opioids. The metrics are overwhelming; e.g., opioid overdose deaths have climbed from 137 per day in 2019 to 255 per day in 2021 [1-3], which is the highest daily rate of opioid overdoses ever reported. The path toward opioid use disorders (OUDs) has fluctuated over the years. In the early 1960s, 80% of individuals reporting opioid misuse initiated with heroin, while 75% of those who reported opioid misuse in the 2000s initiated with prescription opioid pain relievers [4]. Since 2013, there has been a spike in heroin and illicit fentanyl use perhaps due to more stringent regulations governing opioid prescriptions [5]. Therefore, studies focused on elucidating the risk factors for OUD are urgently needed. However, the nature of OUD is extremely complex. Comorbidity with other psychiatric and medical conditions is the rule rather than the exception [6]. In addition, OUD is the progression of a dynamic series of transitions from first use (of either prescription or illegal opioids) to regular use, chronic use, dependence, misuse, and OUD. Furthermore, the risk factors documented are multiple, which can impact several of these transitions, as previously reviewed ([7-9], Box 1). Understanding the factors that put people at risk for each of the transitions would yield a rich profile of risk for opioid-related behaviors and OUD, which could aid diagnosis, prevention, and treatments. However, this design is expensive to conduct in thousands to millions of patients. What should we be measuring, and where can we capture relevant OUD behaviors, with the tools and data sources currently available? In this editorial, we describe the opportunities and challenges of current data modalities (e.g., questionnaires, clinical data, genetic data) and study designs (from laboratory to health system to population-based cohorts) and how the integration of these sources could help accelerate OUD research. We also discuss future goals to fully engage patients and communities for a significant clinical impact.
Historically, one method that has been used to characterize specific aspects of OUD risk includes in-depth examinations in laboratory settings (i.e., in a highly controlled research environment) [24-26]. Such laboratory-based studies lend themselves to collecting large volumes of data, in either ascertained samples of OUD (e.g., [10, 27, 28]) or healthy volunteers to measure specific aspects of OUD vulnerability (e.g., impulsivity, subjective reaction to opioids) [14, 19], via well-established questionnaires and interviews. For example, studies have shown that high ratings of opiate liking, as well as other subjective effects to first exposure to opioids (e.g., good mood, happy ratings), increase liability to dependence and misuse, compared to dysphoria and sedation [19, 29]. Laboratory studies also allow us to capture hard-to-measure characteristics associated with OUD, such as impulsivity [30], adverse childhood events [31], and trauma [15, 16]. Notably, laboratory studies allow for the direct control of environmental variables, including the ability to administer certain doses of opioids under controlled settings [32], akin or generalizable to the data collected from “real-world” settings, such as that from hospital cohorts [33, 34]. However, these laboratory-based approaches are limited by their small sample sizes and the general application of their findings to individuals and circumstances beyond those studied (e.g., younger participants; volunteers, often compensated for their participation).
Consortia were formed in response to these challenges, assembling experts to offer clinical diagnosis and rich evaluations of opioid-related behaviors to gather large and highly accurate study samples. For example, the Psychiatric Genomics Consortium (PGC), which was able to characterize 41,176 individuals (4,503 cases, 36,673 controls) via clinical Diagnostic and Statistical Manual (DSM)-based interviews, performed one of the first large-scale genetic analyses of opioid dependence via a genome-wide association study (GWAS) [35]. This study was instrumental in demonstrating that opioid dependence was heritable, corroborating earlier twin studies [36], and also identified significant genetic correlations between opioid dependence and multiple other psychiatric conditions. In addition, it revealed the importance of assessing opioid exposure, particularly in control populations. The most general control definitions that have been considered to date include (a) individuals exposed to opioids (either legal or illegal) that do not develop OUD or (b) individuals that do not meet criteria for OUD but where prior exposure to opioids is not examined. Although using an unscreened control group can lead to substantial increases in sample size, it can also result in considerable phenotypic heterogeneity across samples and can bias findings [10, 37], namely, the inability to examine whether an individual would have developed OUD had they been exposed to opioids. The amount of exposure required for such groups is not yet clear, but using opioid exposed control groups is vital for future studies [38].
Despite consortium-based studies like this one being extremely valuable, they require major investments of time and money, and they conceal the enormous cost and effort required to reach these sample sizes, which are nevertheless underpowered. Instead, the majority of the recent progress has come from newer “big data” sources that were not explicitly intended to be used for studies of OUD. These data sources include (a) health systems-based cohorts, where clinical information is available in a longitudinal and retrospective fashion, and (b) population-based cohorts, where new data can be collected in millions of subjects in a cost-effective manner. While these data sources have been used for decades [39, 40], there has been an explosion of studies in these areas for psychiatric and medical conditions, including OUD, in more recent years. Additionally, such sources carry great potential for boosting patient engagement efforts with their larger and increasingly diverse cohorts.
Health Systems-Based Cohorts
Opioids, unlike other drugs subject to misuse, are often provided by the health system, meaning that health systems-based cohorts are especially useful for studying OUD. Within health-based cohorts, we can capture relevant OUD clinical behaviors from electronic health records (EHRs). For example, OUD is frequently captured via diagnostic codes, commonly known as International Classification of Disease (ICD) codes. Although imperfect, ICD codes have been able to detect OUD and several other OUD-related behaviors. For instance, cause-of-death intent ICD codes were used to detect the recent sharp increase in drug overdose deaths among adolescents during the COVID-19 epidemic, with the majority of deaths later categorized as unintentional while a minority were categorized as suicides [41]. EHR data allow us to examine ICD codes that tend to co-occur with diseases and characteristics associated with OUD. OUD ICD codes tend to be associated with other psychiatric conditions, such as presence of other substance use disorders (i.e., alcohol and cocaine [41]), mood disorders and suicidality [42-44], and medical comorbidities, such as HIV and hepatitis C [42, 45], as well as chronic pain, which has historically been one of the leading risks for initiating opioid misuse and is being addressed by initiatives such as the NIH HEAL Initiative [42, 46]. However, ICD codes are assigned primarily for billing and administration purposes rather than research and are often not confirmed by a trained clinician [47]. Diagnostic codes are also binary and may not convey the severity of OUD phenotypes [48, 49]. OUD diagnoses via ICD codes also tend to be underreported by providers, partially because of reservations from the lack of specialized training in substance use disorder medicine [21, 50], as well as due to high OUD stigmatization, and because patients may downplay their concerns in an effort to continue receiving prescriptions for opioids, leaving individuals using opioids hesitant to engage with the health care system [51].
The breadth of EHRs allows us to augment our use of ICD codes with other clinical data, such as data about opioid prescriptions, diagnostic tests, and sociodemographic information. The inclusion of opioid prescriptions can serve as additional tools to identify initial use, problematic opioid use [21], and OUD [21, 38, 43, 52]. For example, while recreational use of illicitly obtained opioids is rarely captured in the EHR, we can characterize forms of prescription opioid misuse via prescription counts or clinical notes, rather than querying a participant about off-label use (e.g., [38, 53]). Furthermore, by using longitudinal patterns of opioid prescriptions, we can uncover trajectories suggestive of misuse. Rentsch et al. [54] showed that rapid escalation dose trajectories lead to greater risk for OUD compared to those that escalated at lower rates. OUD treatment medications (e.g., buprenorphine) are useful for examining prescription patterns associated with treatment attrition [55]. Other data modalities are readily available for use. Lab tests and vital signs, including alterations in blood gases and white blood cell markers, when in combination with other variables, can be used as indicators of opioid dependence [34]. Additionally, as part of integrated medical systems, demographic factors related to social determinants of health can now be extracted, which can assist in evaluating OUD risk. EHR can also be combined with external sources, including prescription drug monitoring programs [56, 57], criminal justice data (e.g., records on dates of incarceration, criminal courts’ offense records among Medicaid beneficiaries from the Allegheny County Data Warehouse in Pennsylvania [58]), and even neuroimaging data [37].
EHRs are also valuable resources for examining all major risk factors (Box 1) for OUD, including genetics. For example, we can use EHR paired to genotype data to perform genetic studies of unprecedented size and scope. The genetic factors revealed through such large-scale consortia, specifically from GWAS, have led to a remarkable range of discoveries. The cumulative effects of GWAS can identify novel biological pathways and assist in risk prediction and disease stratification, potentially enabling personalized medicine. Furthermore, GWAS has repeatedly identified associations near genes that are the targets of existing and highly effective pharmacologic agents (e.g., OPRM1 for OUD), suggesting that other GWAS hits may be pointing us toward novel targets for pharmacological interventions. These efforts are being made by multiple academic initiatives, such as eMERGE [59] and PsycheMERGE (https://grantome.com/grant/NIH/R01-MH118233-01), the UK Biobank (https://www.ukbiobank.ac.uk/), and the Million Veteran Program (MVP) (https://www.research.va.gov/mvp/). For example, using EHR data along with DSM-IV opioid dependence criteria, a GWAS by Zhou and colleagues [10] identified an association with the μ-opioid receptor gene (rs1799971, OPRM1), the main biological target for opioid drugs. This study included a cohort drawn from MVP (N = 109,790, 10,544 cases, 72,163 controls), with individuals of European and African American ancestries. A later study by Gaddis et al. [11], comprising 111,481 individuals (23,367 cases, 88,114 controls) of European ancestry using EHR as well as other OUD-related behaviors, identified an apparently distinct locus (rs9478500) within OPRM1 than the one identified by Zhou et al. [10]. Recently, two other GWASs of OUD using a combination of EHR-derived and clinically defined OUD were performed in multi-ancestral cohorts, both extending on sample size (N = 425,944, N = 639,709) and identifying several novel loci beyond OPRM1 [60, 61]. These studies show the ongoing need to include individuals of multiple ancestries in OUD research. Major efforts prioritizing diversity include AllofUs (https://allofus.nih.gov/) and Global Biobank [62, 63], among others [64], which could close the ancestry gap in the near future. While proven useful, GWAS does come at a cost, requiring larger sample sizes and hence having to rely on EHR or population-based cohorts. Nongenetic risk factors may have large effect sizes on OUD risk and be better collected in a laboratory or other small-scale settings.
This plethora of data can aid with in-depth phenotyping, including developing predictive models to identify at-risk individuals [21, 33, 34, 58, 65-71]. The scalability of these big data efforts can allow for the identification of novel clusters and etiologies for OUD [68]. This approach was already streamlined in laboratory settings, such as with the use of latent profile analysis to define different subgroups of OUD-diagnosed individuals at high risk for HIV [72]. We can now extend upon those studies by including a deeper breadth of data and samples with relevant OUD phenotypes.
Population-Based Cohorts
Whereas health systems-based cohorts represent sources of passive clinical data collection, population-based cohorts can serve to collect additional prospective or retrospective data. For example, many population-based cohorts include answers to standardized questionnaires, such as the Aberrant Drug Behavior Index [73] and the Prescription Opioid Misuse Index [74], which can facilitate the dissection of the etiology of OUD into multiple dimensions (e.g., initiation, impulsivity), rather than simply a binary diagnosis. These questionnaires can potentially provide valuable complementary phenotypes at a large scale, even in individuals who do not suffer from OUD, and in populations where the incidence of OUD is low [75].
Beyond well-established questionnaires, population-based cohorts also often contain self-reported information, which can provide information on dimensional phenotypes (i.e., intermediate phenotypes or endophenotypes). For example, a simple question, such as “Have you ever in your life used prescription painkillers taken not as prescribed (e.g., Vicodin, OxyContin)”? collected in 132,113 23andMe research participants, showed strong genetic overlap (rg = 0.64) with clinically ascertained OUD [10], yet did not find any association with rs1799971 in OPRM1 perhaps reflecting that this locus influences other stages that lead OUD diagnosis [12].
Dimensional phenotyping methods can be used as a complementary approach to the traditional ascertainment of clinically diagnosed cohorts. Because dimensional phenotypes (e.g., subjective experience to first opioid use) do not focus on clinical endpoints, they can be rapidly measured in already genotyped population-based cohorts at a very low cost. By studying dimensional phenotypes, it is possible to boost power and to dissect OUD into components, which are not differentiated in a case:control framework. GWAS of dimensional phenotypes, when combined with datasets using well-ascertained OUD samples within a multivariate framework, could help to elucidate the genetic basis of OUD behaviors and provide a more granular understanding of the processes impacted by specific loci, which could have implications for diagnosis, prevention, and treatments.
Other relevant phenotypes can be measured using wearables to capture physiological data such as electrodermal activity, accelerometry, and skin temperature [76]. Additionally, pre-existing imaging data can be used to identify neural features, such as reduction in subcortical integrity, associated with OUD [77]. Patterns and motives of use can even be extracted from social media [78-80]. One study identified comments and conversations on suicide among opioid users on Instagram, representing a key health risk factor for OUD on a platform heavily utilized by young adults [80].
Limitations of Current Approaches
While health systems-based and population-based cohorts are promising data sources, limitations do exist. EHRs are heterogeneous, and data may be fragmented across different health systems or lost from lack of data collection (e.g., patient was never asked about opioid use) or lack of documentation (e.g., patient was asked about opioid use but the information was not recorded) [81]. However, some of these issues can be addressed by imputation methods [34, 47]. One limitation of these studies is that the results derived from particular health care systems are not necessarily generalizable (e.g., between cohorts that enroll participants based on disease status rather than those recruited from the general population [82]). Additionally, while health system-based and population-based cohorts allow for the collection of data from large sample sizes, they generally lack phenotypic depth as compared to traditional laboratory studies [75]. Furthermore, population-based cohorts are often under sample-specific ancestry groups (https://www.ukbiobank.ac.uk/, https://www.23andme.com/), although cohorts like MVP and AllofUs more closely represent the ancestral diversity found in the USA. Ascertainment bias causes voluntary cohorts to be different from the general population, including higher socioeconomic status, higher education, better health, and lower drug use [75]. This poses a particularly major problem when examining illicit drugs. Socioeconomic status, which tends to be higher in voluntary population-based cohorts such as UK Biobank, has been shown to influence genetic studies of other commonly used substances (e.g., alcohol [83]). There also exists sampling bias, e.g., causing individuals to be more inclined to agree to participate in a study because a family member was also involved [84]. Some population-based cohorts also involve privacy and intellectual property concerns, restricting access to and shareability of raw data, such as 23andMe [75]. Even with such limitations, both types of cohorts have been allowed for larger sample sizes than ever before generated for OUD research.
The Path Forward
Because of the dangers of restricting studies to a subset of the population, multiple ongoing efforts are seeking to expand participation in OUD research. Not only will this enrich the breadth and depth of data to collect but also ensure the findings are used in nonstigmatizing ways. This is important because there are known implicit racial biases within health care systems in overall opioid prescription receipt [85], lack of health insurance [86], and access to OUD treatment [87]. To increase clinical utility, models such as learning health systems, which bridge hospital and community-based approaches, have been implemented across health care settings to open more channels of communication between researchers, clinicians, and the communities they serve [88, 89]. Learning health systems are highly valuable resources for research on the opioid crisis to identify and reduce the risks of OUD, as demonstrated by the Veterans Health Administration (VA) and Group Health, a Seattle-based nonprofit health care system [85, 90-92]. As another example, focus groups for parents with OUD (who may be at risk of losing parental rights) have been established in communities across Washington State, allowing case managers to work directly in this community using the validated Research Prioritization by Affected Communities protocol [93]. Efforts such as these have already impacted policy creation, including the development of prescription guidelines and ensuring support models for providers prescribing OUD medications [94]. Building upon existing health care systems, community participation-based research combined with publicly available and self-reported questionnaire data will allow us to build a data ecosystem to help understand the nuances of OUD. We need to sustain and grow these integrative efforts to improve OUD outcomes.
Conclusion
In closing, in this Editorial we have reviewed the development of OUD research, from studies in which individuals are obtained based on diagnostic criteria to more recent health care systems, population, and community-based studies. While further expansions to better represent the broader community are still needed, it is abundantly clear that the continued combination of varied data sources and data modalities has the potential to better delineate risk for OUD in the years to come. Such studies will impact research efforts across multiple domains, with the ultimate goal to transform the research findings into actionable outcomes for patients and communities.
Conflict of Interest Statement
Sandra Sanchez-Roige is an editorial board member for Complex Psychiatry but has no other conflicts of interest to declare. The other authors have no conflicts of interest to declare.
Funding Sources
Sandra Sanchez-Roige and Sevim B. Bianchi were supported by funds from the California Tobacco-Related Disease Research Program (TRDRP; Grant No. T29KT0526), Sandra Sanchez-Roige was also supported by NIDA DP1DA054394. A.D.J. received support for this work from the Agency for Healthcare Research and Quality (AHRQ) and the Patient-Centered Outcomes Research Institute (PCORI) under Award Number K12 HS026395. The content is solely the responsibility of the authors and does not necessarily represent the official views of AHRQ, PCORI, the US Government, or the National Institutes of Health.
Author Contributions
Sandra Sanchez-Roige conceived the idea. Sevim B. Bianchi and Sandra Sanchez-Roige wrote the first draft of the article. Sevim B. Bianchi, Alvin D. Jeffery, David C. Samuels, Lori Schirle, Abraham A. Palmer, and Sandra Sanchez-Roige edited the article.