Background/Aims: “Number needed to” metrics may hold more intuitive appeal for clinicians than standard diagnostic accuracy measures. The aim of this study was to calculate “number needed to diagnose” (NND), “number needed to predict” (NNP), and “number needed to misdiagnose” (NNM) for neurological signs of possible value in assessing cognitive status. Methods: Data sets from pragmatic diagnostic accuracy studies examining easily observed and dichotomised neurological signs (“attended alone” sign, “attended with” sign, head turning sign, applause sign, la maladie du petit papier) were analysed to calculate the NND, NNP, and NNM. Results: All measures of discrimination showed broad ranges. The range of NND and NNP suggested that these signs were, with a single exception, of value for correctly diagnosing or predicting cognitive status (presence or absence of cognitive impairment) when between 2 and 4 patients were examined. However, NNM showed similar values (range 1–5 patients) suggesting risk of misdiagnosis. Conclusion: NND, NNP, and NNM may be useful, intuitive, metrics in assessing the utility of diagnostic tests in day-to-day clinical practice. A ratio of NNM to either NND or NNP, termed the likelihood to diagnose or misdiagnose, may clarify the utility or inutility of diagnostic tests.
Many measures of discrimination have been used to describe the utility of diagnostic tests [1, 2]. Most usually, diagnostic test accuracy studies report paired values of test sensitivity and specificity and positive and negative predictive values (PPV, NPV). Other, single, global or unitary, indicators of test diagnostic performance have also been described, including:
correct classification accuracy, the total number of true positives and true negatives divided by the total number of patients assessed, and inaccuracy, the total number of false positives and false negatives divided by the total number of patients assessed (= 1 – accuracy);
Youden index (Y), a combination of sensitivity and specificity, given by (sensitivity + specificity – 1) ;
predictive summary index (PSI, or Ψ), a combination of positive and negative predictive values given by (PPV + NPV – 1) .
All these parameters have values ranging between 0 and 1, sometimes expressed as percentages. It may be difficult for clinicians to relate these numeric outcomes to individual patients in day-to-day clinical practice.
Cook and Sackett  introduced the “number needed to treat” (NNT) metric as a way to represent the “impact” of treatments. This measure is arguably more intuitive to clinicians and patients than more traditional measures of discrimination. Adaptations of NNT have been described (e.g., “number needed to harm” (NNH) ; “number needed to see” (NNS) ). Analogous adaptations may be relevant to diagnostic test accuracy studies.
The inverse of the Youden index (1/Y) has been defined as the “number needed to diagnose” (NND), that is, the number of patients who need to be examined in order to correctly detect one person with the disease of interest in a study population of persons with and without the known disease . For diagnostic tests, low values of NND will be desirable.
Linn and Grunau  also suggested a new statistic, the inverse of PSI (1/PSI or 1/Ψ), which they termed the “number needed to predict” (NNP), interpreted as the number of patients who need to be examined in the patient population in order to correctly predict the diagnosis of one person. Whilst NND is insensitive to variation in disease prevalence, since it depends entirely on sensitivity and specificity, NNP is dependent on prevalence and may therefore be deemed a better descriptor of diagnostic tests in patient populations with different prevalence of disease . For diagnostic tests, low values of NNP will be desirable.
Habibzadeh and Yadollahie  have proposed another index, the “number needed to mis diagnose” (NNM), as a measure of diagnostic test effectiveness, defined as the inverse of (1 – accuracy) = 1/inaccuracy. NNM is the number of patients who need to be tested in order for one to be misdiagnosed by the test. For diagnostic tests, high values of NNM will be desirable.
A number of simple, non-canonical, neurological signs of possible value in the diagnosis of cognitive status have been described, whose utility is based in part on their being easily observed and categorised as present or absent: the “attended alone” sign and its converse the “attended with” sign, the head turning sign, the applause sign, and la maladie du petit papier . The aim of the current study was to reanalyse data sets from diagnostic test accuracy studies of these signs in order to calculate and compare the parameters NND, NNP, and NNM.
Data from pragmatic prospective diagnostic accuracy studies undertaken in a dedicated cognitive disorders clinic, located in a secondary care setting (regional neuroscience centre) and using a standardised methodology [1, 9], were analysed.
These studies examined the following non-canonical neurological signs:
the attended alone sign : defined as the patient attending the clinic appointment without a knowledgeable informant, despite prior provision of written instructions to do so;
the attended with sign : the converse of the attended alone sign, the patient attending the clinic appointment with an informant in accordance with prior provision of written instructions to do so;
the head turning sign [11-13]: the patient turning her/his head towards an accompanying informant when asked open questions about memory symptoms during the history taking phase of the clinical assessment;
the applause sign : in the clinical examination phase of the assessment the patient is asked to clap hands three times, and responds with more than three claps;
la maladie du petit papier [15, 16]: the patient presents a self-written list of symptoms (on paper or iPad) during the clinical assessment.
All these signs are easily observed and dichotomised as present/absent. The attended alone sign and la maladie du petit papier have been suggested to indicate absence of cognitive impairment, whereas the attended with, head turning, and applause signs have been suggested to indicate the presence of cognitive impairment .
Data from these studies, pooled where appropriate, were used to calculate the following parameters: sensitivity and specificity, Youden index , and NND ; PPV and NPV, PSI , and NNP ; accuracy, inaccuracy, and NNM .
Reference standard diagnoses were dementia, mild cognitive impairment, or subjective memory complaint, by judgment of an experienced clinician based on standard diagnostic criteria for dementia (DSM-IV) and mild cognitive impairment (Petersen) .
A further, novel, metric was also derived, the “likelihood to be diagnosed or misdiagnosed” (LDM). Analogous to the previously described “likelihood to be helped or harmed” (LHH) metric, calculated as the ratio of NNH to NNT , LDM is given by the ratio of NNM to either NND or NNP. Since for diagnostic tests low values of NND and NNP and high values of NNM are desirable, higher values of LDM (> 1) would suggest a test more likely to diagnose than misdiagnose.
Prevalence (P) of cognitive impairment for each study was calculated as the number of patients receiving a criterion diagnosis of dementia or mild cognitive impairment (true positives and false negatives) divided by the total number of patients assessed. Level of the test (Q) was calculated as the number of patients with a positive test in the population studied (true positives and false positives) divided by the total number of patients assessed.
A summary of the different studies (Table 1) showed a broadly similar prevalence of patients with cognitive impairment (range 0.32–0.63), the outlier being the study of the head turning sign which logically required exclusion of those who attended alone. Level of the test showed a broad range, from low frequency (la maladie du petit papier = 0.05) to high frequency (attended with = 0.66).
The sensitivity and specificity of the different signs varied (Table 2), from very sensitive (attended alone for diagnosis of no cognitive impairment = 0.93, or no dementia = 1.00; attended with for diagnosis of any cognitive impairment = 0.93) to insensitive (la maladie du petit papier for diagnosis of no cognitive impairment = 0.05). The expected trade-off between sensitivity and specificity was observed, with less sensitive signs being more specific. A range of values for the Youden index was observed (0.05–0.60), and hence also for NND (1/Y), ranging from 1.67 (head turning sign for any cognitive impairment) to 20 (la maladie du petit papier for no cognitive impairment).
The PPV and NPV of the different signs varied (Table 3), with a PPV range of 0.45–0.95 and NPV range of 0.43–1.00. A range of values for PSI was observed (0.28–0.56) and hence for NNP (1/PSI), ranging from 1.79 (head turning sign for any cognitive impairment) to 3.57 (la maladie du petit papier for no cognitive impairment).
The accuracy (range 0.45–0.79) and inaccuracy (range 0.21–0.55) of the different signs varied (Table 4), with NNM ranging from 1.82 (la maladie du petit papier for no cognitive impairment) to 4.76 (applause sign for dementia).
Values for the LDM (Table 5) were high for some signs (> 1), suggesting balance in favour of diagnosis over misdiagnosis (e.g., head turning sign for any cognitive impairment), and low (< 1) for others, suggesting balance in favour of misdiagnosis over diagnosis (e.g., la maladie du petit papier for no cognitive impairment).
Clinicians generally think in terms of patients, rather than probabilities. Thus, “number needed to” parameters may hold particular intuitive appeal for clinicians. To the author’s knowledge, this study represents a first attempt to characterise neurological signs in terms of the number needed to diagnose, predict, and misdiagnose metrics suggested by Linn and Grunau  and Habibzadeh and Yadollahie .
Values for NNP for all the signs examined suggested that between 2 and 4 patients need to be examined in the patient population for correct prediction of either the diagnosis of cognitive impairment in someone with a positive test result or absence of cognitive impairment in someone with a negative test result. These numbers suggest that these signs may be of clinical use in day-to-day practice, an observation which might influence clinician uptake.
Conversely, values for NNM suggested that similar numbers, between 2 and 5 patients, need to be examined in order for one to be misdiagnosed by the test. Generally, tests with low NND or NNP had higher NNM (e.g., head turning sign for diagnosis of any cognitive impairment) whilst those with high NND or NNP had low NNM (e.g., la maladie du petit papier for diagnosis of no cognitive impairment). Other signs had similar values for NND, NNP, and NNM (e.g., attended lone, attended with, applause).
The study has a number of limitations. All index studies were undertaken in the same clinic, with the risks of patient-based (selection, spectrum) and test performance biases , and all were cross-sectional studies with risk of diagnostic error. Studies of these signs in settings with different disease prevalence (e.g., primary care, community) would be of interest, and with follow-up for delayed verification of diagnosis.
The neurological signs examined are non-canonical, and currently not widely used (with the possible exception of the applause sign, particularly in the context of movement disorder clinics), although potentially widely applicable, since they are quick to perform, cost free, and easily interpreted and categorised. Some validation studies in independent patient cohorts have been reported for some of these signs [19, 20], but studies of possible relationships to disease biomarkers are in their infancy . The signs examined are easily dichotomised, thus facilitating calculation of NND, NNP, and NNM, which may not be the case for cognitive screening instruments which require the application of test cut-offs . Nevertheless, calculation of these metrics may help clinicians to decide on the possible value of specific signs and tests in the clinical setting.
The utility or inutility of these “numbers needed to” parameters will, as for measures of discrimination, depend on the clinician’s purpose in doing the test. If the clinician wishes to identify all cases (no false negatives), a highly sensitive test with low NND or NNP, with consequent risk of false positives, may be acceptable despite low NNM. If the clinician’s purpose is to exclude all non-cases (false positives), for example in a treatment trial, a low NNM may outweigh low NND or NNP. LDM may give a more global measure of diagnostic gain.
The author declares no conflicts of interest.