Abstract
Background: The Mini-Mental State Examination (MMSE) is widely used in population-based longitudinal studies to quantify cognitive change. However, its poor metrological properties, mainly ceiling/floor effects and varying sensitivity to change, have largely restricted its usefulness. We propose a normalizing transformation that corrects these properties, and makes possible the use of standard statistical methods to analyze change in MMSE scores. Methods: The normalizing transformation designed to correct at best the metrological properties of MMSE was estimated and validated on two population-based studies (n = 4,889, 20-year follow-up) by cross-validation. The transformation was also validated on two external studies with heterogeneous samples mixing normal and pathological aging, and samples including only demented subjects. Results: The normalizing transformation provided correct inference in contrast with models analyzing the change in crude MMSE that most often lead to biased estimates of risk factors and incorrect conclusions. Conclusions: Cognitive change can be easily and properly assessed with the normalized MMSE using standard statistical methods such as linear (mixed) models.
Introduction
The Mini-Mental State Examination (MMSE) [1] is one of the most popular psychometric tests used to quantify global cognitive functioning and cognitive change in population-based longitudinal studies. MMSE consists of a series of questions aiming at quantifying global cognitive functioning on a 0-30 scale. In clinical practice, MMSE is widely used as a screening test or as part of the diagnosis of dementia [2], but it is also useful in patients' clinical follow-up to evaluate the severity of dementia, to decide whether or not initiating or stopping antidementia treatment, to estimate the prognosis, and to characterize the burden of dementia at a population level.
Yet, when exploring specifically cognitive change in aging populations mixing normal and cognitively impaired populations, the MMSE poses problems in statistical analyses because of its relatively poor metrological properties. The maximum MMSE score can be easily reached by a not cognitively impaired individual for whom the actual cognitive performance is thus not measurable [2]. This is particularly frequent among individuals with a high educational level (EL) [3]. Conversely, in severely impaired individuals, the minimum MMSE score may also be reached, making it impossible again to measure the actual cognitive performance. These well-known metric properties define, respectively, a ceiling effect and a floor effect. From a psychometric point of view, another limit strongly related to them is that a 1-point change in the score does not have the same clinical meaning according to the initial score. This nonstable sensitivity to change defines the curvilinearity. The MMSE has been shown to be a highly curvilinear psychometric test [4] since its sensitivity to change strongly varies with a very poor sensitivity to change in high scores (27-30) and a relatively good sensitivity to change in the medium range of scores (10-20).
Curvilinearity does not necessarily tarnish the discriminative ability of MMSE for dementia prediction. However, it implies that standard statistical modelling of predictors of MMSE scores is no more appropriate as it is mostly based on a gaussian assumption and a stable sensitivity to change. For instance, a previous paper showed that the linear mixed model used to describe change over time of MMSE and its predictors could conclude to spurious associations with risk factors [5]. To handle this curvilinearity, we previously developed a latent process model that was found very efficient to correct these biases [5,6]. However, this model is rather complicated to implement and only available in specific software.
In this context, our objective was to provide and validate a simple normalizing transformation of the MMSE using the latent process model. Such normalizing transformation that would correct for the MMSE poor metrological properties aims at enabling the use of standard statistical methods to assess predictors of MMSE scores in population-based longitudinal studies.
Methods
Populations
Estimation and validation of the normalizing transformation of MMSE were based on data from two large population-based prospective cohorts of cognitive aging (PAQUID and Three-City Study). The PAQUID study was established in 1988 to study cerebral aging and incidence of dementia [7]. Community-dwelling subjects lived in south-west of France and were at least 65 years old at the initial visit. They were followed up after 1 year and then every 2-3 years during 20 years. The multicenter Three-City (3C) study began in 1999 and aimed at evaluating the relationship between vascular factors and risk of dementia [8]. Subjects aged at least 65 years were recruited in 3 French cities: Bordeaux, Dijon and Montpellier. They were followed-up every 2-3 years during 10 years. For both cohorts, psychometric tests and dementia diagnosis based on DSM-III-R criteria were assessed at each visit.
From the original cohorts, MMSE scores at baseline were excluded due to a ‘first-passing effect' previously evidenced [9]. This effect, possibly explained by the apprehension of the test situation at the inclusion visit, translates in an improvement between the two first visits. All the subjects (including subjects with dementia) who had at least one measure of MMSE from the first-year follow-up in PAQUID and the 2-year follow-up in 3C were included in the analysis.
Estimation and Validation Samples
In order to maximize the information, the PAQUID sample and the Bordeaux sample of the 3C study were pooled together in an ‘estimation sample' to define the normalizing transformation. Validation of the normalizing transformation was done on external ‘validation samples' that were 3C Montpellier and 3C Dijon samples. These samples also consisted in heterogeneous populations mixing normal and cognitively impaired populations. To further evaluate whether the transformation could also be applied in prodromal AD, we considered two subsamples constituted of all the subjects diagnosed with dementia (prevalent and incident) from the 3C Montpellier and 3C Dijon samples (called ‘demented 3C') or from the PAQUID sample (called ‘demented PAQUID').
Covariates
Numerous covariates were included in the statistical models described below: gender, EL defined in three classes (subjects who graduated from secondary school, subjects who graduated only from primary school, and subjects who did not graduate from primary school), age at baseline, a cohort indicator, and when available, an indicator of presence of one or two alleles ε4 of the Apolipoprotein E (ApoE4). Several timescales were also investigated: time from entry in the cohort, age (coded in decades from age 65) and time preceding and subsequent to the diagnosis of dementia in dementia samples.
Statistical Models
In studies of cognitive change in which cognitive ability is measured by MMSE, the metrological properties of MMSE can be corrected by applying a normalizing transformation of the score. This was proposed in the longitudinal setting with a latent process mixed model that simultaneously transformed the scores and modelled the transformed scores according to covariates in a linear mixed model [6,10]. In this approach, the quantity of interest is the actual unobserved cognitive level that generated the scores. This actual cognitive level defines a latent process whose trajectory over time is explained by a standard linear mixed model including random effects to model the between-subjects variability and covariates to evaluate their impact on the cognitive trajectory. The link between the actual cognitive level and the observed scores consists in a parameterized transformation that normalizes the score by capturing its metrological properties (including the curvilinearity). Originally, all the parameters were estimated simultaneously, so that the estimated transformations could slightly differ from one regression structure to the other (and/or between populations), and parameter estimates of investigated risk factors (computed in the transformed scale) could not be quantitatively compared between studies.
Yet, as the normalizing transformation corrects curvilinearity, which is an intrinsic property of the psychometric test, the transformations should be relatively stable from one study to the other, and a unique transformation could be defined once for all. This work aimed at evaluating this assumption and proposing and validating a unique transformation for the MMSE. Indeed, based on such unique transformation, standard linear mixed models could be applied directly on the transformed data to analyze change in cognition measured by MMSE, and regression parameters could be quantitatively compared even on different populations as the transformed scale would remain the same.
Strategy of Analysis
In a first step, the stability of a normalizing transformation of the MMSE was investigated in a cross-validation analysis. For this, the estimation sample was repeatedly and randomly divided into two subsamples, the training and the test sets. The training set was used to estimate the transformation called H* using a latent process mixed model called M(train). The test set was used to assess the transformation H* on independent data. Specifically, we estimated two models on the test set: one latent process mixed model called M(test) in which the transformation was again estimated to fit at best the test data, and one linear mixed model called M(test)* that was applied to the scores previously transformed by H*. These two models had exactly the same regression structure. By regression structure, we mean covariate adjustment, timescale and shape of trajectory (quadratic or linear) with corresponding correlated random effects.
These steps of division and estimation were repeated 500 times, and for each repetition the two models M(test) and M(test)* were compared. Under our assumption of stability of the normalizing transformation, for a given regression structure, the results found by the linear model on transformed data [M(test)*] were expected to be in agreement with those found by the more flexible latent process model [M(test)]. Comparison of the regression parameters was done with percentages of variation between the estimates, variances of the estimates (obtained by the delta method in the latent process model), and p values of the Wald tests for estimates of significance.
This cross-validation analysis was replicated several times by changing the structures of regression within M(train) and M(test) and between M(train) and M(test) in order to evaluate the stability of the transformations to very different settings.
In a second step, the unique transformation of MMSE was computed as the averaged transformed scores over repetitions, and was validated on different validation samples using the same indicators as for the cross-validation analysis.
Normalizing Transformation
The normalizing transformation of MMSE preserves the original direction of the test with lower values indicating lower cognitive functioning. The transformation was rescaled in 0-100 (as 100·[H*(Y) - H*(0)]/[H*(30) - H*(0)]), so that the minimal MMSE score 0 corresponds to a normalized MMSE score of 0 and the maximal MMSE score 30 corresponds to a normalized MMSE score of 100.
Results
Samples Description
The six samples used to create or validate the normalizing transformation are presented in table 1. The samples included 3,000, 1,889, 4,253 and 1,965 subjects in PAQUID, 3C Bordeaux, 3C Dijon and 3C Montpellier, respectively. Among them, 856 (28.5%) received a diagnosis of dementia in PAQUID, 308 (16.3%) in 3C Bordeaux, 343 (8.1%) in 3C Dijon and 165 (8.4%) in 3C Montpellier.
The mean age at inclusion varied little between cohorts from 73 (standard deviation, SD, = 5.4) years old in 3C Montpellier to 74.9 (SD = 6.6) years old in PAQUID. The percentage of women was also relatively stable between cohorts varying from 58.5% in 3C Montpellier to 62.1% in 3C Bordeaux. The median MMSE score at inclusion was 27 (interquartile range, IQR, 24-28) in PAQUID and 28 (IQR = [26, 29]) in the three other samples. The main difference was for EL: while 64.8-79.3% of subjects graduated from secondary school in the three 3C samples, only 22.4% graduated from secondary school in PAQUID, which must be explained by the 10-year difference in the inclusion periods.
The ‘Demented 3C' subsample was composed of 508 subjects with a mean age at dementia diagnosis of 81.9 (SD = 6.0); 58.5% were women and 64.4% graduated from secondary school. The median MMSE score at inclusion was 26 (IQR 24-28). The ‘Demented PAQUID' subsample was composed of 856 subjects with a higher mean age at dementia diagnosis [86.1 (SD = 5.9) year], more women (70.1%) and only 16% who graduated from secondary school (57% graduated at least from primary school). The median MMSE score was the same at inclusion (26, IQR 22-28).
MMSE score distributions are displayed in figure 1. They are relatively similar in PAQUID, 3C Bordeaux, 3C Dijon and 3C Montpellier samples, i.e. very asymmetric with a strong ceiling effect. In the demented samples, MMSE scores are lower, but the distributions remain substantially asymmetric and still contain high MMSE scores. Indeed, these samples include only demented subjects but mix pre- and postdiagnosis scores.
Variability of the Normalizing Transformation
Various latent process mixed models were considered in order to appreciate the variability of the MMSE normalizing transformation. The transformation was systematically approximated by quadratic I-splines [10] with the same 7 nodes (0, 10, 20, 23, 26, 28, 30) chosen according to the MMSE distributions in PAQUID and 3C Bordeaux samples during the follow-up (fig. 1). Figure 2 displays these transformations estimated on the 6 samples for different regression structures. Exclusively in this figure (to make the comparison possible), the transformations were rescaled using intermediate MMSE values (20, 26, 29) for which we had enough observations in all our samples rather than extreme values (0 and 30) that were relatively rare in some samples, especially value 0. All the transformations have very close shapes whatever the sample and whatever the regression structure. Some differences may be seen in low values (probably due to the lack of information in this range of values) but remain moderate. The transformations exhibit a very low sensitivity to change in the highest MMSE scores and a higher sensitivity to change in intermediate scores, illustrating the curvilinearity issue. As an example, in the normalized scale, a 0.5 loss between levels of 0.8 and 0.3 corresponds to an observed loss of 2 points between MMSE 30 and 28, while the same loss in the normalized scale between 0 and -0.5 corresponds to an observed loss of around 6 points between 26 and 20 (i.e. three times greater).
The cross-validation (with 500 repetitions) was applied on the 4,889 subjects of PAQUID and 3C Bordeaux taken together, with a training set of 3,600 subjects. To estimate the transformation, four models [M(train)] assuming a quadratic subject-specific trajectory with age and differing by the adjustment were estimated (no adjustment, adjustment for age at entry and/or cohort indicator, or adjustment also on education and gender). We chose a quadratic subject-specific trajectory with age to capture the accelerated cognitive decline in older ages previously found in PAQUID [6,13]. For the models assessed on the test set [M(test)], the timescale was either age or time since entry in the cohort, the subject-specific trajectory was linear or quadratic, and again different adjustments were assumed. All the cross-validation analyses showed similar results: the percentages of variation of the parameter estimates were always below 10%, and the variances estimated when using the previously obtained transformation H* in M(test)* were very close to the ones obtained with the latent process mixed model M(test) (re-estimating the transformation). Finally, the significance tests agreed most of the time in over 95% of the cases. Table 2 presents two examples with M(train) including the cohort variable and age with a quadratic trend and M(test) including either a subject-specific linear trajectory according to age and adjusted for age at inclusion, cohort indicator and their interactions with age, or a subject-specific quadratic trajectory according to time from entry, adjusted for gender, EL, age at inclusion and their interaction with time and quadratic time. All the other cross-validation analyses are available on request.
Normalized MMSE
According to this, we defined the normalizing transformation of the MMSE as the pointwise mean transformation over the 500 repetitions in a model assuming a quadratic subject-specific trajectory over age and no adjustment for covariates. Figure 3a provides the correspondence table between the raw scores of MMSE and the final mean normalized scores of MMSE (that define the final transformation H*). Figure 3b displays these 500 transformations and the 31 final normalized values to emphasize the stability of the transformation. As an example, to correct metrological properties, crude scores of 20, 24 and 28 become, respectively, 37.37, 51.44 and 74.61. This transformation underlines the nonstable sensitivity of MMSE to change: a 4-point difference in crude MMSE represents an actual difference in the normalized MMSE score of roughly 14 points between MMSE scores of 20 and 24 and of more than 23 points between MMSE scores of 24 and 28.
External Validation
This normalizing transformation was validated on external data using 3C Montpellier and 3C Dijon samples, as well as the subsamples of demented subjects described in table 1. We compared results obtained when reestimating the normalizing transformation, when using the one provided in figure 3 or when using a standard linear mixed model on crude scores (most frequently done in practice). Various structures of regression were tested on each sample. For clarity, in table 3 we only present the results from a single model for each validation sample. We did not always choose the same structure to illustrate the consistency of the results. Globally, the proposed normalizing transformation provided results close to those obtained when reestimating the transformation (that specifically corrects the metrological properties on the targeted sample), while the linear mixed model produced much more biased estimates. For example, in 3C Dijon, we investigated the effect of EL on the cognitive decline from entry in the cohort. Using the linear mixed model, the effect at baseline was underestimated while the effect on the slope with time was overestimated, leading to incorrect conclusions regarding EL as a risk factor of cognitive decline (p = 0.004 for the three-category EL covariate). In contrast, the two models that either used the proposed normalizing transformation or estimated it, found similar estimates and conclusions, indicating no association between EL and cognitive change (with, respectively, p = 0.634 and p = 0.565 for the three-category EL covariate). In 3C Montpellier, the effect of ApoE4 was evaluated on the cognitive trajectory with age, and again, the conclusions diverged between the naïve linear mixed model and the models correcting the metrological properties. Finally, in the dementia samples, we studied the pre- and postdiagnosis declines according to EL and age at diagnosis. Again, diverging conclusions regarding the effect of EL on pre- and postdiagnosis slopes were found with the naïve linear mixed model in the PAQUID sample. In addition, the linear mixed model results were in favor of a postdementia decline twice larger than the predementia decline in the reference group composed of individuals diagnosed at 80 years without diploma (-5.13 points/year after vs. -2.77 points/year before diagnosis in 3C sample), whereas the other models correcting the metrological properties found relatively similar declines (e.g. -4.19 points/year after vs. -3.75 points/year before diagnosis in the 3C sample when estimating the transformation and -3.65 vs. -3.77 when using the proposed transformation). In addition to the correct inference, the transformed scores also provide covariate effects that can be compared between analyses. For example, with the same model estimated on the two demented samples, we showed that the mean cognitive level at diagnosis for subjects with no diploma (and diagnosed at 80 years) was 8 points lower in PAQUID than in 3C (36.04 vs. 44.34) but only 3 points lower (36.04 + 7.40 = 43.44 vs. 44.34 + 1.91 = 46.25) for subjects who graduated from primary school and 2 points lower for those who graduated from secondary school (36.04 + 14.86 = 50.90 vs. 44.34 + 4.70 = 49.04) because of a more pronounced effect of EL in PAQUID. Rates of change according to EL could be compared similarly.
Discussion
We proposed a normalizing transformation of the MMSE that corrects its weak metrological properties and makes possible the use of standard statistical methods for continuous variables that are linear (mixed) models to study MMSE scores as a dependent variable. In publications on cognitive change, MMSE scores have been mostly analyzed using linear models, although crude MMSE scores did not satisfy the underlying assumptions (gaussian dependent variable and constant sensitivity to change). Yet, an extensive simulation study recently showed that the effects of risks factors on MMSE change over time could be largely impacted by this violation with type I errors for tests of risk factors up to 90% [5].
To correct the asymmetric distribution of MMSE scores, some authors proposed to use z-scores. However, such standardization does not correct the ceiling/floor effect or the varying sensitivity to change [14]. Others applied Tobit regressions [15], but they do not correct the tricky varying sensitivity to change. Rarely, a transformation, the square root of the number of errors, was considered, which considerably reduced the biases [9]. In this work, we proposed to move one step forward by (a) using a transformation that was specifically estimated to correct at best the metrological properties of the MMSE on a large brain-aging population-based study, and by (b) validating this transformation on several external datasets. This transformation is derived from a latent process mixed model that initially aimed at simultaneously correcting at best the metrological properties of a scale and estimating a regression model on the underlying normalized version of the scale [10]. However, such latent process mixed model remained a too complex statistical method and, as the normalizing transformation was estimated along with the effects of the investigated risk factors (computed in the transformed scale), these effects could not be quantitatively compared between studies. Using a unique normalizing transformation defined once for all, statistical analyses should be easy using standard methods, and effects of investigated risk factors should be quantitatively comparable between studies as illustrated in the demented external samples. We emphasize that the proposed transformation is dedicated to the statistical analysis of MMSE as a dependent variable. By changing the interval between two successive values, we counteract the metrological problems with MMSE and provide a score that uses the same information as MMSE but with corrected intervals between successive scores. Since it preserves the rankings, the transformed score has exactly the same discriminative properties as crude MMSE when used to predict dementia.
Validation of the normalizing transformation was done in two steps. First, in a cross-validation study, we found that using a linear mixed model on transformed data by a previously defined normalizing transformation was an appropriate alternative to the more complex latent process mixed model when the population of interest was close to the population used for defining the transformation. This was also appropriate when the two regression structures (used for defining and for validating the transformation) differed greatly. Although we investigated here different specifications of models for studying cognitive decline, we did not aim at comparing their epidemiological value. We considered a wide range of regression structures to underline the stability of the proposed transformation to different settings. In an external validation, we showed that the normalizing transformation of MMSE also applied to cohorts not used for the estimation, even when the validation cohorts differed substantially from those used for the estimation (cohorts including only prevalent and incident cases of dementia for instance).
Nevertheless, such transformation is not universal. As the transformation is directly linked to the distribution of the MMSE scores, the transformation applies only to cohorts in which an asymmetry in the MMSE scores is observed. We intended to validate the normalizing transformation on a cohort of initially demented subjects [16]. However, the MMSE distribution had already a gaussian shape with most values inside the 10-25 range. On these data, curvilinearity was no more an issue as the main curvilinearity occurs above 25. In contrast, such asymmetry always occurs in prospective population-based cohorts including very heterogeneous ages and mixtures of subjects with a normal aging and subjects at the preclinical phase of dementia, or in clinical studies focusing on the progression from mild cognitive impairment to dementia. In these cohorts, central with the current effort for prevention and for evaluation of interventions in prodromal AD, correcting the metrological properties of MMSE is crucial to correctly analyze cognitive functioning and properly evaluate determinants of cognitive decline. The normalizing transformation we proposed will make possible a correct analysis of MMSE scores and will permit a direct quantitative comparison of risk factor effects obtained in different longitudinal studies. Although developed and validated on the French-language version, it should apply to any valid translation of MMSE. Indeed, the asymmetric distribution of MMSE in heterogeneous populations is observed whatever the language used for administering the MMSE [3,17].
The transformed scores are obtained from figure 3 or by using the R package NormPsy that computes the normalized transformation and provides a function for back-computing predictions in the crude MMSE scale.
Acknowledgments
This work was carried out within the MOBIDYQ project (grant 2010 PRSP 006 01), which is funded by the Agence Nationale de la Recherche. We thank the REAL.FR/DSA group for providing data from the REAL.FR cohort. These data have not been directly used in this paper but have contributed to this work development. The PAQUID study was funded by SCOR insurance, Agrica, Conseil régional of Aquitaine, Conseils Généraux of Gironde and Dordogne, Caisse Nationale de Solidarité pour l'Autonomie, IPSEN, Mutualité Sociale Agricole, and Novartis Pharma (France). The Three-City Study was funded by Sanofi-Synthélabo, Fondation pour la Recherche Médicale, Caisse Nationale d'Assurance Maladie des Travailleurs Salariés, Direction Générale de la Santé, Conseils Régionaux of Aquitaine and Bourgogne, and Fondation Plan Alzheimer.
Disclosure Statement
There are no conflicts of interest.