Abstract
Background: Mild cognitive impairment (MCI) is a condition that entails a slight yet noticeable decline in cognition that exceeds normal age-related changes. Older adults living with MCI have a higher chance of progressing to dementia, which warrants regular cognitive follow-up at memory clinics. However, due to time and resource constraints, this follow-up is conducted at separate moments in time with large intervals in between. Casual games, embedded into the daily life of older adults, may prove to be a less resource-intensive medium that yields continuous and rich data on a patient’s cognition. Objective: To explore whether digital biomarkers of cognitive performance, found in the casual card game Klondike Solitaire, can be used to train machine-learning models to discern games played by older adults living with MCI from their healthy counterparts. Methods: Digital biomarkers of cognitive performance were captured from 23 healthy older adults and 23 older adults living with MCI, each playing 3 games of Solitaire with 3 different deck shuffles. These 3 deck shuffles were identical for each participant. Using a supervised stratified, 5-fold, cross-validated, machine-learning procedure, 19 different models were trained and optimized for F1 score. Results: The 3 best performing models, an Extra Trees model, a Gradient Boosting model, and a Nu-Support Vector Model, had a cross-validated F1 training score on the validation set of ≥0.792. The F1 score and AUC of the test set were, respectively, >0.811 and >0.877 for each of these models. These results indicate psychometric properties comparative to common cognitive screening tests. Conclusion: The results suggest that commercial card games, not developed to address specific mental processes, may be used for measuring cognition. The digital biomarkers derived from Klondike Solitaire show promise and may prove useful to fill the current blind spot between consultations.
Introduction
Mild cognitive impairment (MCI) is a condition where ≥1 cognitive domains are slightly impaired, but the instrumental activities of daily living are still intact [1, 2]. People with MCI have a higher chance of progressing to a form of dementia, and MCI can also signal other neurologic or psychiatric diseases such as vascular disease or depression [1-3]. Therefore, the timely detection of patients with MCI is necessary to provide support and devise a (non)pharmaceutical management approach [1, 4]. While clinically valid, modern cognitive assessment is limited by the mode of administration, which is often a pen, paper, and stopwatch [5]. Such modes of administration require the continuous attention of a trained administrator, limiting the type and amount of data points captured, and making measurements vulnerable to administrator bias and white-coat effect [6, 7]. The consequent lack of accurate high-resolution data can make it difficult to make informed inferences of neuropsychological processes [5]. Cognitive assessment via digital biomarkers of cognitive performance could be an addition to the current cognitive toolset, by contributing to a more complete cognitive profile [5]. Digital biomarkers [8, 9] are user-generated physiological and behavioral measures, captured through connected digital devices, which can provide high-resolution, objective, and quantifiable cognitive data [10].
For MCI, the systems measuring digital biomarkers of cognitive performance can be categorized into 4 groups [10]: systems using dedicated or passive sensors, systems with wearable sensors, nondedicated technological solutions (e.g., software that captures text input), and dedicated or purposive technologies such as games. Games are in a unique position to yield digital biomarkers, as they are autelic in nature, i.e., played for the enjoyment they offer without the need of or request from a third person. Hence, they are intrinsically motivating and do not necessitate an administrator, thereby avoiding the white-coat effect and related biases. Moreover, they can provide different challenges with every playthrough while leaving the fundamental game rules intact [5]. This possibility of supplying novel challenges contrasts with the static property of classical cognitive testing, i.e., administering them over a short period of time makes them more prone to learning effects [5].
Whereas previous research into games and cognition focused primarily on games specifically designed for the purpose of measuring cognition (i.e., serious games), current research is investigating commercial off-the-shelf (COTS) video games as a medium for digital biomarkers of cognitive performance [11]. While both serious and COTS games may provide more interactive, immersive, and engaging experiences than traditional cognitive screening [12-14], COTS games have the important advantage of already being woven into the daily life of older adults. Previous research indicates that serious games for training and measuring cognition still lack engagement and suffer from attrition in longitudinal studies [14-16]. As such, this study explores whether Klondike Solitaire, an existing popular Solitaire card variant [17], can be used to detect differences in cognitive performance in healthy older adults and those with MCI.
To this end, Klondike Solitaire data from 23 healthy older adults and 23 older adults with MCI were captured. Derived digital biomarkers of cognitive performance were used to train machine-learning models to classify individuals belonging to either group. Successful classification of MCI via machine learning supports the efficacy of COTS games to detect differences in cognitive performance on an individual level.
Materials and Methods
Participants
Participants with MCI were recruited from 2 leading memory clinics in Belgium and all had a clinical diagnosis of multiple-domain amnestic MCI according to Petersen’s diagnostic criteria [18]. Healthy participants were recruited using a snowball sample starting from multiple senior citizen organizations. They were screened by using 2 commonly used cognitive screening tests and a structured interview, the Montreal Cognitive Assessment (MoCA), the Mini-Mental-State Examination (MMSE), and the Clinical Dementia Rating (CDR) scale [19-21]. The inclusion and exclusion criteria for both groups can be found in Table 1. Out of 64 enrolled participants, 23 healthy older adults and 23 older adults with MCI fulfilled all inclusion criteria. These 46 participants all played the same 3 games, resulting in a total of 138 games captured.
Study Overview
This study is part of an overarching study that assesses cognitive performance through meaningful play (ClinicalTrials.gov ID. NCT02971124). Every observation was conducted in the home of the participant between 9 a.m. and 5 p.m to ensure a familiar and distraction-free environment. All sessions were completed on a Lenovo Tab 3 10 Business tablet running Android 6.0. All Klon-dike Solitaire games were played on a custom-build Solitaire application which captured several game metrics, originally created by Bielefeld [22] under the LGPL 3 license. In this application, cards requested from the pile came in 3s, with unlimited passes through the pile. Points could be earned or lost by making the following moves: cards put from build to suit stack added 60 points, cards put from pile to suit stack added 45 points, revealing cards on the build stack added 25 points, retrieving cards from suit to build stack subtracted 75 points, and going through the whole pile subtracted 200 points. Before playing Solitaire, a standardized 5-min introduction of the tablet and game was given. In addition, a practice game was played where questions to the researcher were allowed. Afterwards, 3 rounds of Klondike Solitaire, each with a different shuffle, were played in succession. To prevent unfair shuffle (dis)advantages, deck shuffles were identical for all participants for each round. These 3 shuffles were chosen beforehand by the researchers so that they were solvable and varied in difficulty. While playing these 3 rounds, no questions were allowed, and game play continued until the rounds were finished or until the participant indicated that they deemed no further moves were possible.
Data Analysis
While playing Klondike Solitaire, general game data such as the total time, score, and outcome were captured. In addition, for every single move, the time stamp, touch coordinates, origin card information, destination card information, and the possibility of other moves on the board were logged. These game data were used to calculate the digital biomarkers of cognitive performance (Table 2). These digital biomarkers can be considered basic game metrics enriched with game information. This contextualization is important to aid the interpretation of the cognitive information from the game. For example, a larger number of pile moves made can be interpreted as progression in the game, but can equally be interpreted as the player not realizing that they are stuck. By dividing the amount of pile moves by the number of total moves, a more informative candidate digital biomarker can be obtained. This contextualization resulted in 61 candidate digital biomarkers of Klondike Solitaire (Table 2) to be classified in 1 of 5 categories: result-based, i.e., biomarkers related to performance at the end of a game; performance-based, i.e., biomarkers related to performance during the game; time-based, i.e., biomarkers related to time; execution-based, i.e., biomarkers related to the physical execution of moves; and auxiliary-based, i.e., biomarkers related to help features.
Potential digital biomarkers of cognitive performance in Klondike Solitaire, divided into 5 categories.

For model training, a machine-learning procedure was adapted from Raschka [23] (Fig. 1), using scikit-learn [24] as the main machine-learning library. All data were split using a randomized stratified sampling method (102 games from 34 participants in the training set and 36 games from 12 participants in the test set). To prevent data leakage due to identity confounding [25], rounds were split subject-wise instead of record-wise (i.e., all rounds of a participant were either all in the test set or the training set). Heavily correlated features (p > 0.9) were removed to prevent multicollinearity [26]. In total, 26 features remained after selection (in bold type in Table 2). Afterwards, features were scaled using a Standard Scaler. As each algorithm has its inherent biases with none being superior to the rest, 19 classification models were trained, ranging from linear models like logistic regression up to nonlinear models like Gaussian Naïve Bayes [23]. The selection of our models was based on their maturity, popularity, and support available in the scikit-learn machine-learning library. To evaluate them during the training phase, the 5-fold, cross-validated F1 scores were compared. The hyperparameters of the 3 most performant models were further optimized. Ultimately, these 3 best performing models were evaluated on the test dataset.
Results
Study Population
In total, 46 participants (23 MCI and 23 healthy) were enrolled, resulting in 138 rounds of Klondike Solitaire being captured. Demographic and basic neuropsychological data of both groups can be found in Table 3.
Model Performance
The average results of all selected digital biomarkers of cognitive performance across the 3 rounds for both groups can be found in Table 4. The 5-fold, cross-validated F1 validation score of the 19 initial base models was, on average, 0.738. The validation performance metrics of the 3 best fine-tuned models obtained an F1 score of 0.812 (SD 0.058) for the Gradient Boosting classifier, 0.797 (SD 0.074) for the Nu-Support Vector classifier, and 0.792 (SD 0.102) for the Extra Trees classifier. Test performance metrics of these models for the 36 rounds in the test set achieved an F1 score of 0.821 with an AUC of 0.892 for the Gradient Boosting classifier, an F1 Score of 0.824 with an AUC of 0.901 for the Nu-Support Vector classifier, and an F1 score of 0.811 with an AUC of 0.877 for the Extra Trees classifier. Confusion matrixes and ROC curves for these 3 models can be found in Figure 2.
Discussion
Digital biomarkers of cognitive performance, embedded into casual game play, can be used for cognitive monitoring. By evaluating the efficacy of these candidate digital biomarkers of cognitive performance to distinguish healthy older adults from older adults with MCI, new research opportunities may emerge for monitoring the cognitive trajectories of older adults.
In total, 136 rounds were collected from 46 participants (23 healthy and 23 diagnosed with MCI). Derived digital biomarkers were used to train 19 diverse machine-learning models optimized for the F1 score (the harmonic mean of precision and recall). The choice of optimizing for the F1 score was 2-fold. First, the possible damage of false negatives, as well as false positives, is significant. False negatives, i.e., older adults with MCI being classified as healthy, could postpone diagnosis, thereby leading to longer undetected disease progression. False positives, i.e., healthy older adults being classified as having MCI, could have an equally detrimental impact; a misdiagnosis of cognitive impairment could further spiral depression in a healthy older adult. Second, F1 score is a robust parameter for unbalanced datasets. Should these studies be expanded to real-life settings where MCI and healthy populations are not equal in size, this scoring parameter will likely still be of relevance to other researchers.
After hyperparameter fine-tuning, the 5-fold, cross-validated F1 training score on the validation set was >0.792 for each of the 3 selected models. When evaluated on the test set, each of these models had an F1 score >0.811 and an AUC >0.877. The ROC curves of each model also revealed promising decision thresholds to maximize sensitivity (true positive rate) and specificity (1-false positive rate). It can also be noted that the 3 selected models come from different machine-learning model techniques: a bagged decision tree ensemble (Extra Trees), a boosted decision tree ensemble (Gradient Boosting), and a Support Vector model (Nu-Support Vector) [24]. These high performances on validation and test, combined with the variety of techniques used, indicate that the digital biomarkers contain cognitive information, and that successful classification does not have to hinge on the intricacies of a certain model. In contrast, these robust results indicate that digital biomarkers of cognitive performance, measured while playing Klondike Solitaire, are impacted by MCI. When combined, these digital biomarkers may even be used to train machine-learning models to discern older adults with MCI from their healthy counterparts, lending support for their use in detecting cognitive decline.
The performance metrics of our models appear to be in line with those of current neuropsychological screening tests. Two of the most common screening tests for discriminating MCI from healthy individuals are the MoCA [19] and the MMSE [20]. In a systematic review by Pinto et al. [28], a mean AUC of 0.883 was found for the MoCA and 0.780 for the MMSE. While this study is not meant as a validation study of Klondike Solitaire, our results indicate possible comparative psychometric properties. However, the performance metrics appear to be below the findings of previous studies using serious games. Valladares-Rodríguez et al. [29] investigated the use of machine-learning models to distinguish between healthy older adults, older adults with MCI, and older adults with Alzheimer’s disease. Their serious game set Panoramix consists of 7 games based on 7 pre-existing neuropsychological tests such as the California Verbal Test. Their Random Forest classifier obtained a global training accuracy of 1.00, a global F1 score of 0.99, a sensitivity of 1.00 for MCI, and a specificity of 0.7 for MCI. Direct comparison with this study is, however, problematic, due to the different inclusion criteria, the absence of a hold-out test set, and the ternary classification.
Although this study focused on discerning healthy older adults from older adults with MCI using measurements from a single point in time, our findings may well have a bearing on their use in frequent cognitive monitoring. As pointed out by Piau et al. [10], perhaps the biggest shortcoming of today’s neuropsychological examination is that it is taken at separate points in time at large intervals. This makes the results vulnerable to temporary alterations of motivation or cognition (e.g., stress or tiredness). As argued by Pavel et al. [5], the general principles of measurement may be extended to psychological processes. By increasing the number of measurements, uncertainty due to imperfections in the tool can be reduced, and natural variations in cognition caused by the characteristics of the phenomenon can be detected. The spatial and temporal richness of data derived from longitudinal game play may allow for a more detailed cognitive profile, and could signal events where cognition has been altered (e.g., the impact of a changed medication regimen or trauma) [5]. In addition, personal cognitive baselines can be created which allow the individual to be compared with themselves as opposed to normative data [9]. These cognitive baselines could be used to detect subtle cognitive fluctuations, an early indicator of cognitive change [5, 10, 30].
There are limitations to this study that should be addressed in future work. In particular, the small sample size prevents us from drawing any absolute conclusions. This might have led to potential bias in the test set, explaining the performance discrepancy between the test and validation set. In addition, discrepancies in age, tablet experience, and Klondike Solitaire experience in the 2 groups may have confounded the results. Confirmatory studies with larger and more balanced sample sizes are needed to further investigate the psychometric properties of using casual card games for screening.
Conclusion
This study set out to investigate the suitability of the card game Klondike Solitaire to detect MCI via machine learning. The major finding is that casual card games, not built for the purpose of measuring cognition, can be used to capture digital biomarkers of cognitive performance which are sensitive to the altered cognition caused by MCI. Hence, the popularity of casual games amongst today’s older generations may prove useful for supplying cognitive information between consultations. Notwithstanding the relatively small sample size, this work offers valuable insights into the use of casual games to detect cognitive impairments.
Acknowledgement
The authors would like to thank all participants who volunteered for this study. The authors also thank the staff of the memory clinics of University Hospital Leuven and Jessa Hospital for making recruitment possible.
Statement of Ethics
This study was conducted in compliance with the Declaration of Helsinki and all applicable national laws and rulings concerning privacy. Approval was granted by the Ethics Committee Research UZ/KU Leuven, Belgium, CTC S59650. All tests were conducted after obtaining written informed consent from the participants. Collected data related to the cognitive status during the observations was made anonymous and stored in a secure database. All participants were informed that no information would be used for diagnostic or clinical purposes.
Conflict of Interest Statement
The authors have no conflicts of interest to declare.
Funding Sources
This study received funding from the KU Leuven Impulse Fund IMP/16/025 and the Flemish Government (AI Research Program).
Author Contributions
K.G. cocreated the digital biomarkers, recruited healthy participants, coded the Android app, processed the study data, and codrafted the first manuscript version. J.T. is PI of the clinical study and designed the protocol used. M.-E.V.A. and J.T. recruited participants with MCI and supervised the clinical validity of the study. K.V. and M.D. supervised the technical validity and contributed to the machine-learning pipeline. V.V.A. cocreated the digital biomarkers, codrafted the first version of the manuscript, and supervised the whole study. All authors critically revised and added comments to the manuscript.