Background: The 5-item World Health Organization Well-Being Index (WHO-5) is among the most widely used questionnaires assessing subjective psychological well-being. Since its first publication in 1998, the WHO-5 has been translated into more than 30 languages and has been used in research studies all over the world. We now provide a systematic review of the literature on the WHO-5. Methods: We conducted a systematic search for literature on the WHO-5 in PubMed and PsycINFO in accordance with the PRISMA guidelines. In our review of the identified articles, we focused particularly on the following aspects: (1) the clinimetric validity of the WHO-5; (2) the responsiveness/sensitivity of the WHO-5 in controlled clinical trials; (3) the potential of the WHO-5 as a screening tool for depression, and (4) the applicability of the WHO-5 across study fields. Results: A total of 213 articles met the predefined criteria for inclusion in the review. The review demonstrated that the WHO-5 has high clinimetric validity, can be used as an outcome measure balancing the wanted and unwanted effects of treatments, is a sensitive and specific screening tool for depression and its applicability across study fields is very high. Conclusions: The WHO-5 is a short questionnaire consisting of 5 simple and non-invasive questions, which tap into the subjective well-being of the respondents. The scale has adequate validity both as a screening tool for depression and as an outcome measure in clinical trials and has been applied successfully across a wide range of study fields.
In his monograph on clinimetrics, Feinstein  used the term ‘improvement after treatment' to describe the patient's own assessment of change in his or her well-being during treatment. This is a very subjective index, which no biological marker can capture [1,2,3]. When reviewing 75 scientific articles covering more than 100 different scales or questionnaires, Gill and Feinstein  demonstrated that a clinimetric definition of subjective well-being was lacking. They therefore advocated for the development of short global rating scales of subjective well-being, which would reflect a single dimension with high clinical face validity. Another important issue in relation to the measurement of subjective well-being was raised by Ware , who suggested that the rating scales dedicated to this purpose should be disease anonymous (generic) because such scales provide information regarding the overall effect (balancing wanted clinical effects against unwanted adverse effects) of a clinical intervention. Furthermore, a generic scale enables a comparison with mean values from the general population (which can be used as a criterion of remission) or with mean values from other clinical populations irrespective of the disease entity or condition under examination.
The 5-item World Health Organization Well-Being Index (WHO-5) is a short and generic global rating scale measuring subjective well-being. The WHO-5 was derived from the WHO-10 , which in turn was derived from a 28-item rating scale  used in a WHO multicentre study in 8 different European countries . The 10 items making up the WHO-10 were selected from among these 28 items on the basis of a non-parametric item response theory analysis , which identified the 10 most valid items from the original 28-item scale . The items for the 28-item scale were selected from the Zung scales for depression, distress and anxiety as well as from the General Health Questionnaire and the Psychological General Well-Being Scale . Therefore, both the 28-item scale and the WHO-10 include items phrased negatively to reflect symptoms of distress (‘Feeling downhearted and blue') and items phrased positively, reflecting well-being (‘Waking up feeling fresh and rested'). Because the WHO considers positive well-being to be another term for mental health , the WHO-5 only contains positively phrased items . The WHO-5 items (fig. 1) are: (1) ‘I have felt cheerful and in good spirits', (2) ‘I have felt calm and relaxed', (3) ‘I have felt active and vigorous', (4) ‘I woke up feeling fresh and rested' and (5) ‘My daily life has been filled with things that interest me'. The respondent is asked to rate how well each of the 5 statements applies to him or her when considering the last 14 days. Each of the 5 items is scored from 5 (all of the time) to 0 (none of the time). The raw score therefore theoretically ranges from 0 (absence of well-being) to 25 (maximal well-being). Because scales measuring health-related quality of life are conventionally translated to a percentage scale from 0 (absent) to 100 (maximal), it is recommended to multiply the raw score by 4 (fig. 1). Notably, the layout of the WHO-5 follows that of the Major Depression Inventory which measures the WHO/ICD-10 symptoms of depression . This goes both for the Likert scaling of each item from 0 to 5 and for the period of time considered (the past 2 weeks) [14,15].
The WHO-5 was originally presented at a WHO meeting in Stockholm in February 1998 as part of a project on the measurement of well-being in primary health care patients . Subsequent to this project, the WHO Regional Office in Europe initiated translations of the original English version of the WHO-5 into a number of other languages. At present, the WHO-5 has been translated into over 30 languages and has been used in research projects all over the world. The objective of this study was to provide a systematic review of the extensive body of literature on the WHO-5, with particular emphasis on the following aspects: (1) the clinimetric validity of the WHO-5; (2) the responsiveness/sensitivity of the WHO-5 in controlled clinical trials; (3) the potential of the WHO-5 as a screening tool for depression, and (4) the applicability of the WHO-5 across study fields.
A search for ‘WHO (Five)' OR ‘WHO-5' OR ‘WHO-five' OR ‘WHO well-being' OR ‘WHO well being' OR ‘WHO 5' OR ‘WHO five' OR ‘World health organization 5' or ‘World health organization five' was carried out in PubMed. The same search was then carried out in PsycINFO. All abstracts published prior to or on March 31, 2014 were considered. One author (S.S.) screened the search results to exclude books, conference abstracts/posters and papers that were clearly irrelevant as well as non-English articles. The remaining papers were evaluated by C.W.T. and S.S. and were included for full text review if they contained information on the use of the WHO-5 other than as a pre-study screening instrument. C.W.T., S.S. and P.B. then evaluated the full papers.
The PRISMA flow chart  (fig. 2) shows the number of articles found and later kept or excluded in the different phases of the screening. The database search identified 964 titles, which were reduced to 501 after removal of duplicates. A total of 214 of these titles were either not full-text articles, were written in a non-English language or were obviously not on a WHO-5-related subject and therefore excluded. The remaining 287 full-text articles were screened, and 213 were found eligible for inclusion in the review. In online supplementary table 1 (for all online suppl. material, see www.karger.com/doi/10.1159/000376585), these 213 published studies are listed, and the main WHO-5 results are described. Besides the trials listed in online supplementary table 1, we consulted the European Quality of Life Survey 2012  to obtain the European general population norm values on the WHO-5 (online suppl. table 2).
The Clinimetric Validity of the WHO-5
The most adequate clinimetric evaluation of the WHO-5 was performed by a panel of experts in the field of health-related quality of life . This group evaluated 85 different questionnaires and found that 20 of these were ‘acceptable'. In terms of clinimetric validity, the WHO-5 was listed at the top among the 20 scales since any major overlap with specific disease-related aspects and side effects of pharmacological treatment is absent on this scale. In other words, the WHO-5 is a pure generic scale for the measurement of general well-being .
The construct validity of a scale describes its properties as a coherent measure of a dimension of interest (in this case well-being). Construct validity is evaluated by determining whether each item on the scale contributes with unique information regarding the dimension. If this is the case, the scale covers the theoretical range from the complete absence of well-being to the highest imaginable level of well-being. The WHO-5 has been analysed with the item response theory model formulated by Rasch  in both younger persons  and in elderly persons , confirming that the 5 items constitute a unidimensional scale, where each item adds unique information regarding the level of well-being. This implies that the individual item scores can be added to a ‘meaningful' total score and that the range of scores from 0 to 100 covers the entire dimension from the complete absence of well-being to the highest imaginable level of well-being.
The mean WHO-5 score in the general population has been measured in different European countries (online suppl. table 2). Thus, when using the WHO-5 as an outcome measure in clinical trials, the ideal goal should be to reach the general population mean score. In Danish general population studies [23,24], the mean WHO-5 score is 70.
The predictive validity of a rating scale is of great importance. The predictive validity of the WHO-5 has been investigated in a study in which patients with cardiac disease were followed over a period of 6 years . Patients who scored <50 on the WHO-5 at baseline proved to have significantly higher mortality rates compared to those scoring ≥50.
The Responsiveness/Sensitivity of the WHO-5 in Controlled Clinical Trials
Table 1 describes the 6 controlled clinical trials in which the WHO-5 has been used as an outcome measure. Wade et al.  conducted a placebo-controlled study of melatonin for the treatment of insomnia. At the end point, the participants in the group had obtained a level of 69. However, the difference at the end point between the active group and the control group was approximately 3 on the WHO-5, which is statistically significant but not clinically significant, since the threshold for a clinically relevant change is considered to be 10 points on the WHO-5 .
Hoffman et al.  tested the effect of mindfulness-based therapy versus a wait-list group among patients with breast cancer. The WHO-5 baseline score in each of the groups was approximately 50, which is indicative of reduced well-being (when WHO-5 is used for the screening of depression, a cut-off score of ≤50 is used; table 2). The difference between the effect of the mindfulness-based therapy and the control group was approximately 10 points on the WHO-5, i.e. just barely clinically significant, but the patients in the active group still had mean WHO-5 values below the general population norm at the end point .
Logtenberg et al.  compared the effects of two different forms of insulin administration in patients with type 1 diabetes. At baseline, the participants had a WHO-5 mean score just below 50, and those in the active group (intraperitoneal administration) improved their WHO-5 scores, reaching the general population norm, in contrast to the control group (subcutaneous administration).
Robinson et al.  tested the effect of paroxetine in comparison to placebo among individuals with tinnitus. The baseline WHO-5 mean score was below the cut-off score for clinical depression (≤28). According to the WHO-5 results, neither paroxetine nor placebo had any effect on the well-being of those affected by tinnitus .
The WHO-5 has also been used as an outcome measure in controlled clinical trials involving patients with major depression. In these trials, the WHO-5 can be viewed as a measure which taps into the balance between the desired clinical effects and the unwanted side effects. The study by Guico-Pabia et al.  compared desvenlafaxine with placebo by pooling the results from 8 different trials. The baseline score is representative for major depression in the primary care setting (mean score ≤28). The difference between desvenlafaxine and placebo after 8 weeks was statistically significant but not clinically significant. Also, the desvenlafaxine group only reached a WHO-5 mean of 50, i.e. it still has not recovered if we use the general population norm as the goal of treatment .
Another study of depression assessed the effect of wake therapy and exercise among patients with treatment-resistant depression, i.e. those who had not responded to at least two different antidepressants in their current major depressive episode . During the 9-week trial, the increase in the WHO-5 score in the wake therapy group was statistically superior to that of the exercise group. However, the end point WHO-5 mean score in both groups was still far below that of the general population norm .
The Potential of the WHO-5 as a Screening Tool for Depression
Table 2 describes the performance (sensitivity and specificity) of the WHO-5 when used as a screening tool for depression. Most of the studies used DSM-IV depression assessed by various structured interviews as their gold standard reference. In the first 8 studies listed in table 2, a cut-off score of ≤50 on the WHO-5 was used to assign a ‘screening diagnosis' of depression. For these 8 studies, the mean sensitivity for DSM-IV depression was 0.87, and the mean specificity for DSM major depression was 0.76. For all 18 studies, the sensitivity of the WHO-5 was 0.86, and the specificity was 0.81 (table 2). In the study by Lowe et al. , the WHO-5 cut-off score was ≤28, which more restrictively equals the level of well-being among patients with DSM-IV major depression. Despite this restrictive threshold, the WHO-5 had a sensitivity of 0.93 and a specificity of 0.83 in the detection of depression.
For a screening instrument such as the WHO-5, having sufficiently high sensitivity (i.e. a very high proportion of depressed individuals screen positive) is a key factor, whereas high specificity is less important. This is due to the fact that the second step of the diagnostic process, after an initial positive screening with the WHO-5, consists of a diagnostic interview performed by a trained clinician, during which ‘false positives' (patients screening positive on the WHO-5 but not meeting criteria for depression) will be detected .
The Applicability of the WHO-5 across Study Fields
Table 3 lists the number of WHO-5 publications stratified by field of study. The scale has been used most extensively in endocrinology, which is explained by the fact that the WHO-5 was developed in a Pan-European study of patients with diabetes . Within the field of diabetes, the most comprehensive study employing the WHO-5 is the multinational study of psychosocial issues in diabetes by Nicolucci et al. . The participants were patients with diabetes from 17 countries. Using a cut-off score of ≤28 on the WHO-5, approximately 14% of the patients were screened as having a depression. As shown by Nicolucci et al. , depression may have a substantial negative impact on diabetic control.
Among the WHO-5 studies on depression, the work by Krieger et al.  is highly important because both the Hamilton Depression Scale (HAM-D) and the Hamilton Anxiety Scale (HAM-A) were included among the indices of validity. A score of 18 on the HAM-D, which indicates the level of depression at which treatment is needed (major depression), corresponded to a WHO-5 score of 20, and an HAM-D score of 13 (minor depression) corresponded to a WHO-5 score of 32 .
Suicidology is also among the fields where the WHO-5 has been applied. When health-related quality of life was accepted as an important outcome in clinical trials, it was found that psychological well-being was a major element of this dimension. As discussed by Andrews and Withey , psychological well-being can be considered as the sum of satisfactions that makes life worth living, constituting the opposite pole to psychological pain with suicidal thoughts . In the WHO Multisite Intervention Study on Suicidal Behaviours (SUPRE-MISS), the WHO-5 was used to identify subjects who attempted suicide. The main finding of the SUPRE-MISS was that those attempting suicide generally had extremely low scores on the WHO-5 [37,38]. Furthermore, Vijayakumar et al.  found that subjects with repeated suicide attempts scored lower on the WHO-5 than subjects with a first attempt (49.6 vs. 58.6, p = 0.01). Finally, Awata et al.  showed that subjects with suicidal ideation scored significantly lower on the WHO-5 than subjects without suicidal ideation (45.6 vs. 67.6, p = 0.01).
In other illnesses than depression within the psychiatric field, the WHO-5 has been used most extensively in relation to alcoholism and other substance use disorders. Hensing et al.  found a negative correlation between well-being and harmful alcohol use in females but not in males. Elholm et al.  focused on the alcohol withdrawal syndrome in outpatients treated with chlordiazepoxide as anti-abstinence drug. Most individuals scored <50 on the WHO-5 at inclusion. After 2 weeks of therapy, the WHO-5 score increased significantly. Hoxmark et al.  found that the mean scores were approximately 38 on the WHO-5 at baseline and that 47% scored <50. There was a negative correlation between the severity of abuse and the WHO-5 scores.
Cardiovascular disease has previously been mentioned in the section on the clinimetric properties of the WHO-5 . The results found by Birket-Smith et al. , i.e. that myocardial infarction patients with WHO-5 scores of >50 are well functioning, was supported by the study by Bergmann et al.  among patients who had survived myocardial infarction and were followed for >2 years. The fact that patients having survived myocardial infarction have rather high WHO-5 scores has been considered by Garnefski et al.  as an expression of a positive coping style. They referred to this effect as a form of ‘post-traumatic growth'.
In the field of neurology, the WHO-5 has been found valid in screening for depression in patients with Parkinson's disease . Furthermore, the WHO-5 has been used as an outcome scale in a placebo-controlled study on a new pain-reducing wound dressing in patients with venous leg ulcers, where the WHO-5 was used as an outcome scale . The results showed a significant superiority of the active treatment over placebo. The applicability of WHO-5 in back pain disorder has also been evaluated as being acceptable by both Volinn et al.  and Vereckei et al. .
In the field of stress research, the WHO-5 has been used to assess a wide variety of aspects including coping strategies , well-being in occupational health settings , the association between workplace stress and well-being , the links between working condition and well-being  as well as the association between psychosocial conditions and well-being . In the study by Gao et al. , it was found that approximately 35% of a total of 2,796 employees had low well-being (cut-off score on the WHO-5 of <50) and that low social capital at the workplace was associated with poor well-being. This finding was confirmed by Jung et al. . Finally, in the study by Schutte et al.  it was reported that the prevalence of poor well-being was highest in the low education group.
In accordance with the suggestion from Feinstein  and Ware , the WHO-5 was developed as a generic scale without any diagnostic specificity . The WHO-5 was therefore recommended as the ultimate patient-related measure within the international WHO classification system for chronic medical conditions, integrating biological impairments, social disabilities and subjective handicaps . In this systematic review of the WHO-5 as a generic well-being scale, we focused on four aspects: (1) the clinimetric validity of the WHO-5; (2) the responsiveness/sensitivity of the WHO-5 in controlled clinical trials; (3) the potential of the WHO-5 as a screening tool for depression, and (4) the applicability of the WHO-5 across study fields. It is our impression that the WHO-5 has performed well with regard to all these aspects.
The clinimetric validity of the WHO-5 has been evaluated in terms of construct validity (total score being a sufficient statistic). Item response theory analyses have shown that the WHO-5 covers the dimension of subjective well-being from 0 (worst imaginable well-being) to 100 (best imaginable well-being). In the comprehensive review by Hall et al. , the clinical validity of the WHO-5 was evaluated to be very high as the scale can be used irrespective of underlying illness (or lack of illness) and across many different settings.
When used as an outcome measure, the WHO-5 has been able to capture improvement in well-being caused by various pharmacological and non-pharmacological interventions in clinical trials across many different branches of medicine [28,30,32]. Furthermore, by comparing end point ratings on the WHO-5 to general population mean scores, remission rates can be calculated. When used as an outcome measure in clinical trials, the WHO-5 can be viewed as a measure which taps into the balance between the desired clinical effects and the unwanted side effects of a given intervention .
Feinstein  suggested that the validity of a clinimetric scale like the WHO-5 should be tested in terms of sensitivity and specificity, analogously to erythrocyte sedimentation rates as a diagnostic test in general medicine . When summarizing all WHO-5 studies for the screening of depression, the weighted sensitivity was 0.86 and the specificity 0.81 (table 2), which is acceptable. Using a WHO-5 cut-off score of ≤50 is recommendable when screening for clinical depression.
In the late 1990s, the WHO-5 was introduced in the medical field of diabetes. Our review has shown that diabetes remains the condition in which the WHO-5 has been used most extensively. The WHO-5 studies in the field of diabetes indicate that a low WHO-5 score may have a substantial negative impact upon diabetic control, thereby validating the use of the WHO-5 within the international WHO classification system for chronic medical conditions . In this context, the biological impairment is the metabolic disturbance caused by diabetes, and the decreased well-being is an indication of problems in the diabetic care of the patient. This in contrast to some cardiovascular disorders or cancer disorders, where a low WHO-5 score might indicate a clinical depression where the impairment is depression-related, for example poststroke depression  or cancer depression . Within diabetes, obesity and metabolic syndrome have been shown to be related to depression . In a recent review on the metabolic syndrome, Bergmann et al.  found that scales focusing on stressor rather than on distress or low well-being had been used. Many stress-related studies have been performed using the WHO-5 to measure distress or poor well-being. Also, within the WHO SUPRE-MISS, the WHO-5 has been found to be highly applicable .
The WHO-5 has been used extensively worldwide. Online supplementary table 1 shows the diversity of its application across different regions: Africa (Algeria, South Africa), Asia (Bangladesh, China, India, Japan, South Korea, Sri Lanka, Taiwan, Thailand), Europe (Northern, Southern, Eastern, Western and Central Europe), the Americas (Canada, the US, Brazil, Mexico), the Middle East (Israel, Iran, Lebanon) and Oceania (Australia, New Zealand). This very successful dissemination of the scale is probably due to its straightforward language, which poses few translation problems and to the fact that the questions do not seem to transgress any cultural norms in the individual countries.
Recently, Ryff  has reviewed publications on the eudaemonic scales of well-being, which capture the core aspects of what it means to be human, including existential and developmental factors. Based on this eudaemonic approach , Fava  developed his ‘well-being therapy'. It should be stated that the WHO-5 is a clinimetric outcome scale at the description level, parallel to symptom-related scales and side effect scales. Therefore, the WHO-5 should be considered as an outcome scale in Fava's well-being therapies.
In conclusion, the WHO-5 is a short questionnaire consisting of 5 simple and non-invasive questions, which tap into the subjective well-being of the respondents. The scale has adequate validity both as a screening tool for depression and as an outcome measure in clinical trials and has been applied successfully as a generic scale for well-being across a wide range of study fields. In our opinion, the findings of this review show that the WHO-5 is a highly useful tool that can be applied in both clinical practice (for instance to screen for depression) as well as in research studies in order to assess well-being over time or to compare well-being between groups.
This project was funded by the Psychiatric Research Unit, Psychiatric Centre North Zealand, Copenhagen University Hospital, Hillerød, Denmark.
The authors declare no conflicts of interest.