Abstract
Background/Aims: There are many cognitive screening instruments available to clinicians when assessing patients' cognitive function, but the best way to compare the diagnostic utility of these tests is uncertain. One method is to undertake a weighted comparison which takes into account the difference in sensitivity and specificity of two tests, the relative clinical misclassification costs of true- and false-positive diagnosis, and also disease prevalence. Methods: Data were examined from four pragmatic diagnostic accuracy studies from one clinic which compared the Mini-Mental State Examination (MMSE) with the Addenbrooke's Cognitive Examination-Revised (ACE-R), the Montreal Cognitive Assessment (MoCA), the Test Your Memory (TYM) test, and the Mini-Mental Parkinson (MMP), respectively. Results: Weighted comparison calculations suggested a net benefit for ACE-R, MoCA, and MMP compared to MMSE, but a net loss for TYM test compared to MMSE. Conclusion: Routine incorporation of weighted comparison or other similar net benefit measures into diagnostic accuracy studies merits consideration to better inform clinicians of the relative value of cognitive screening instruments.
Introduction
A large number of cognitive screening instruments (CSI) is available for the assessment of patient complaints of poor memory or cognitive impairment [1,2,3]. Although criteria for the optimal CSI have been suggested [4], in practice many different approaches to comparing tests may be undertaken, essentially balancing test speed against accuracy.
When assessing the diagnostic utility of CSI, a number of summary measures is available, which may help to guide clinicians in selecting the test most appropriate for the purpose. In addition to test sensitivity and specificity, diagnostic utility may be expressed in terms of predictive values, likelihood ratios, clinical utility index, agreement between tests (kappa statistic), and the area under the receiver-operating characteristic curve (AUC). All these measures have potential shortcomings.
AUC is commonly used as an overall measure of diagnostic test accuracy, but the shortcomings of this measure have been emphasized [5], specifically the fact that it combines test accuracy over a range of thresholds which may be both clinically relevant and clinically nonsensical. It has been argued that the most relevant and applicable presentation of diagnostic accuracy test results should include interpretation in terms of patients, clinically relevant values for test thresholds, disease prevalence, and clinically relevant relative gains and losses [5]. One such index is the weighted comparison (WC) measure, described by Moons et al. [6], which gives weighting to the difference in sensitivity and specificity of two tests and takes into account the relative clinical misclassification costs of true-positive (TP) and false-positive (FP) diagnosis and also disease prevalence.
The aim of this study was to reinterpret data from a number of pragmatic prospective diagnostic accuracy studies performed in this clinic [7] in terms of the WC measure of Moons et al. [6] in order to compare a number of CSI, specifically the Mini-Mental State Examination (MMSE) [8] with the Addenbrooke's Cognitive Examination-Revised (ACE-R) [9,10], the Montreal Cognitive Assessment (MoCA) [11,12], the Test Your Memory (TYM) test [13,14], and the Mini-Mental Parkinson (MMP) [15,16].
Materials and Methods
Data from four previous pragmatic diagnostic accuracy studies undertaken in this clinic, which compared the MMSE with the ACE-R [10], MoCA [12], TYM test [14], and MMP [16], were reanalyzed. Study details (setting, sample size, dementia prevalence, sex ratio, and age range) are given in table 1. Test cutoffs were determined empirically by examining sensitivity and specificity at all cutoff values with the optimal cutoff being defined by maximal test accuracy for diagnosis. In each of these studies, criterion diagnosis was made by the judgment of an experienced clinician based on diagnostic criteria.
Data on the prevalence of dementia [10,14,16] or of cognitive impairment (dementia and mild cognitive impairment) [12] were obtained from each study along with the change in test sensitivity and specificity, and these figures were applied to the WC equation [5,6]:
WC = Δ sensitivity + [(1 - π/π) × relative cost (FP/TP) × Δ specificity],
where π = prevalence.
The relative misclassification cost (FP/TP) is a parameter which seeks to define how many FPs a TP is worth. Clearly, such a ‘cost' is very difficult to estimate. In the context of diagnostic accuracy studies for CSI, it may be argued that high test sensitivity in order to identify all TPs, with the accompanying risk of FPs (e.g. emotional consequences for a patient because of incorrect diagnosis or inappropriate treatment), is more acceptable than tests with low sensitivity but high specificity which risk false-negative diagnoses (i.e. missing TPs). This argument is of course moot in the current absence of disease-modifying therapies. For this study, FP/TP = 0.1 was therefore arbitrarily set, following previous authors [5], reflecting the desire for high test sensitivity. The WC equation does not take into account false-negative diagnoses, which have their own potential cost.
Positive WC values were taken to indicate a net test benefit, negative values a net loss [5,6]. To aid interpretation, another parameter may be calculated using WC, namely the equivalent increase in TP patients per 1,000, using the equation [5]:
WC × prevalence × 1,000.
Again, positive values were taken to indicate a net test benefit, negative values a net loss.
Results
The figures for sensitivity, specificity, prevalence of dementia (ACE-R, TYM, and MMP studies) or cognitive impairment (MoCA study), Δ sensitivity, Δ specificity, and the calculated WC are given for each of the four tests versus MMSE in tables 2, 3, 4, 5, respectively. AUC and positive and negative predictive values from each study are included for comparative purposes.
The WC calculations suggested a net benefit for ACE-R, MoCA, and MMP versus MMSE, but a net loss for the TYM test versus MMSE. All WC evaluations were in the same direction as the values for AUC (i.e. favoured ACE-R, MoCA, and MMP vs. MMSE, favoured MMSE vs. TYM test).
The equivalent increase in TP dementia patients identified per 1,000 tested was 61 for ACE-R, -26 for TYM test, and 13 for MMP. The equivalent increase in TP cognitively impaired patients identified per 1,000 tested was 121 for MoCA.
Discussion
WC measures may have advantages over more traditional parameters used in the assessment of test utility in diagnostic accuracy studies [6], particularly the AUC. Hence it has been suggested that such measures be incorporated into diagnostic accuracy studies [5]. Net benefit methods to measure test diagnostic performance other than the WC developed by Moons et al. [6] have been described [5].
In this study, data from four previous diagnostic accuracy studies of CSI were reanalyzed to calculate WC. These were pragmatic, observational studies involving unselected patient groups with cognitive complaints of unknown aetiology, rather than experimental studies involving patient groups selected by known diagnostic category, and hence the results should be broadly generalizable since they reflect the idiom of clinical practice [17]. The setting and sample characteristics were broadly equivalent for each of these four studies (table 1).
Overall the calculations suggest a net benefit for ACE-R and, to a lesser extent, MMP for the identification of dementia versus MMSE, and for MoCA for the identification of cognitively impaired patients, with a net loss for the TYM test versus MMSE for the identification of dementia. The equivalent increase for MoCA suggested that fewer than 10 patients needed to be evaluated with this test for 1 additional TP cognitively impaired patient to be identified compared to using the MMSE, concordant with the high sensitivity of this test [11]. Such figures may be easier for clinicians to interpret compared to AUC. All WC evaluations were in the same direction as the values for AUC (i.e. favoured ACE-R, MoCA, and MMP vs. MMSE, favoured MMSE vs. TYM test).
Of course, WC values would be different if the case mix seen in these clinical studies had a different disease prevalence, and if a different relative misclassification cost was selected. WC values would fall with higher disease prevalence in the clinic samples. However, empirically a fall in the frequency of patients with dementia and cognitive impairment and an increase in individuals with subjective memory impairment have been observed over time in these clinics [7], perhaps related to governmental directives on dementia issued in the United Kingdom [18]. The setting of the relative misclassification cost (FP/TP) was arbitrary but stringent (10 TPs for 1 FP or 1 TP judged to be worth 0.1 FP). If one accepted more FPs and/or fewer TPs (i.e. a less sensitive, more specific test) the ratio would rise and the WC value would be higher. As previously noted, the WC equation does not take into account false-negative diagnoses, which have their own potential cost.
It might be argued that the sample sizes in these studies (range of n = 150-243; table 1) may mean that they were underpowered. Sample size calculations were not performed, as is sometimes recommended for diagnostic accuracy studies [19], although a pragmatic approach to sample size estimates has suggested that normative ranges for sample sizes may be calculated for common research designs, with anything in the range of 25-400 being acceptable [20].
As shown in this study, application of the WC measure is straightforward, as well as theoretically attractive, for test interpretation [5,6]. No previous analyses of the diagnostic utility of CSI using this WC method have been identified. The study suggests that there is a case for the routine incorporation of WC or other similar net benefit methods to measure diagnostic test performance into diagnostic accuracy studies.
Disclosure Statement
The author declares no conflicts of interest.