Introduction: This study aimed to develop novel machine learning models for predicting Alzheimer’s disease (AD) and identify key factors for targeted prevention. Methods: We included 1,219, 863, and 482 participants aged 60+ years with only sociodemographic, both sociodemographic and self-reported health, both the former two and blood biomarkers information from Alzheimer’s Disease Neuroimaging Initiative (ADNI) database. Machine learning models were constructed for predicting the risk of AD for the above three populations. Model performance was evaluated by discrimination, calibration, and clinical usefulness. SHapley Additive exPlanation (SHAP) was applied to identify key predictors of optimal models. Results: The mean age was 73.49, 74.52, and 74.29 years for the three populations, respectively. Models with sociodemographic information and models with both sociodemographic and self-reported health information showed modest performance. For models with sociodemographic, self-reported health, and blood biomarker information, their overall performance improved substantially, specifically, logistic regression performed best, with an AUC value of 0.818. Blood biomarkers of ptau protein and plasma neurofilament light, age, blood tau protein, and education level were top five significant predictors. In addition, taurine, inosine, xanthine, marital status, and L.Glutamine also showed importance to AD prediction. Conclusion: Interpretable machine learning showed promise in screening high-risk AD individual and could further identify key predictors for targeted prevention.

1.
Wang
CY
,
Song
PP
,
Niu
YH
.
The management of dementia worldwide: a review on policy practices, clinical guidelines, end-of-life care, and challenge along with aging population
.
Biosci Trends
.
2022 Apr
16
2
119
29
.
2.
Zetterberg
H
,
Schott
JM
.
Blood biomarkers for Alzheimer’s disease and related disorders
.
Acta Neurol Scand
.
2022
;
146
(
1
):
51
5
.
3.
2022 Alzheimer’s disease facts and figures
.
Alzheimers Dement
.
2022
;
18
(
4
):
700
89
.
4.
Sarwar
T
,
Seifollahi
S
,
Chan
J
,
Zhang
X
,
Aksakalli
V
,
Hudson
I
.
The secondary use of electronic health records for data mining: data characteristics and challenges
.
ACM Comput Surv
.
2022
;
55
(
2
):
1
40
.
5.
Shehab
M
,
Abualigah
L
,
Shambour
Q
,
Abu-Hashem
MA
,
Shambour
MKY
,
Alsalibi
AI
.
Machine learning in medical applications: a review of state-of-the-art methods
.
Comput Biol Med
.
2022 Jun
145
105458
.
6.
Chen
SX
,
Xu
C
.
Handling high-dimensional data with missing values by modern machine learning techniques
.
J Appl Stat
.
2022
;
50
(
3
):
786
804
.
7.
Petch
J
,
Di
S
,
Nelson
W
.
Opening the black box: the promise and limitations of explainable machine learning in cardiology
.
Can J Cardiol
.
2022 Feb
38
2
204
13
.
8.
Kwon
Y
,
Rivas
MA
,
Zou
JYJA
Efficient computation and analysis of distributional shapley values
.
2021
. abs/2007.01357.
9.
Serrano-Pozo
A
,
Das
S
,
Hyman
BT
.
APOE and Alzheimer’s disease: advances in genetics, pathophysiology, and therapeutic approaches
.
Lancet Neurol
.
2021
;
20
(
1
):
68
80
.
10.
Wu
C
,
Wu
L
,
Wang
J
,
Lin
L
,
Li
Y
,
Lu
Q
.
Systematic identification of risk factors and drug repurposing options for Alzheimer’s disease
.
Alzheimers Dement
.
2021
;
7
(
1
):
e12148
.
11.
Blennow
KA-O
,
Zetterberg
H
.
Biomarkers for Alzheimer’s disease: current status and prospects for the future
.
J Intern Med
.
2018
;
284
(
6
):
643
63
.
12.
Podhorna
J
,
Krahnke
T
,
Shear
M
,
Harrison
JE
Alzheimer’s Disease Neuroimaging Initiative
.
Alzheimer’s disease assessment scale-cognitive subscale variants in mild cognitive impairment and mild Alzheimer’s disease: change over time and the effect of enrichment strategies
.
Alzheimers Res Ther
.
2016 Feb 12
8
8
.
13.
McKhann
G
,
Drachman
D
,
Folstein
M
,
Katzman
R
,
Price
D
,
Stadlan
EM
.
Clinical diagnosis of Alzheimer’s disease: report of the NINCDS-ADRDA work group under the auspices of department of health and human services task force on Alzheimer’s disease
.
Neurology
.
1984
;
34
(
7
):
939
44
.
14.
Stekhoven
DJ
,
Bühlmann
P
.
MissForest--non-parametric missing value imputation for mixed-type data
.
Bioinformatics
.
2012
;
28
(
1
):
112
8
.
15.
Tokuyama
Y
,
Miki
R
,
Fukushima
Y
,
Tarutani
Y
,
Yokohira
T
.
Performance evaluation of feature encoding methods in network traffic prediction using recurrent neural networks
.
Proceedings of the 2020 8th international conference on information and education technology
Okayama, Japan
Association for Computing Machinery
2020
. p.
279
83
.
16.
Luo
W
,
Phung
D
,
Tran
T
,
Gupta
S
,
Rana
S
,
Karmakar
C
.
Guidelines for developing and reporting machine learning predictive models in biomedical research: a multidisciplinary view
.
J Med Internet Res
.
2016
;
18
(
12
):
e323
.
17.
Guan
XC
,
Zhang
JH
,
Chen
SY
.
Logistic regression based on statistical learning model with linearized kernel for classification
.
Cai
.
2021
;
40
(
2
):
298
317
.
18.
Chauhan
VK
,
Dahiya
K
,
Sharma
A
.
Problem formulations and solvers in linear SVM: a review
.
Artif Intell Rev
.
2019
;
52
(
2
):
803
55
.
19.
Myles
AJ
,
Feudale
RN
,
Liu
Y
,
Woody
NA
,
Brown
SD
.
An introduction to decision tree modeling
.
J Chemom
.
2004
;
18
(
6
):
275
85
.
20.
Sarica
A
,
Cerasa
A
,
Quattrone
A
.
Random forest algorithm for the classification of neuroimaging data in Alzheimer’s disease: a systematic review
.
Front Aging Neurosci
.
2017
;
9
:
329
.
21.
Schapire
RE
.
The strength of weak learnability
.
Mach Learn
.
1990
;
5
(
2
):
197
227
.
22.
Han
SH
,
Kim
KW
,
Kim
S
,
Youn
YC
.
Artificial neural network: understanding the basic concepts without mathematics
.
Dement Neurocogn Disord
.
2018
;
17
(
3
):
83
9
.
23.
Collins
GS
,
Reitsma
JB
,
Altman
DG
,
Moons
KG
.
Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement
.
Ann Intern Med
.
2015
;
162
(
10
):
735
6
.
24.
Ötleş
E
,
Seymour
J
,
Wang
H
,
Denton
BT
.
Dynamic prediction of work status for workers with occupational injuries: assessing the value of longitudinal observations
.
J Am Med Inf Assoc
.
2022
;
29
(
11
):
1931
40
.
25.
Zhou
Q
,
Liao
F
,
Mou
C
,
Wang
P
.
Measuring interpretability for different types of machine learning models
. In:
Ganji
M
,
Rashidi
L
,
Fung
BCM
,
Wang
C
, editors.
Trends and applications in knowledge discovery and data mining
Cham
Springer International Publishing
2018
. p.
295
308
.
26.
Licher
S
,
Leening
MJG
,
Yilmaz
P
,
Wolters
FJ
,
Heeringa
J
,
Bindels
PJE
.
Development and validation of a dementia risk prediction model in the general population: an analysis of three longitudinal studies
.
Am J Psychiatry
.
2019 Jul 1
176
7
543
51
.
27.
Battista
P
,
Salvatore
C
,
Castiglioni
I
.
Optimizing neuropsychological assessments for cognitive, behavioral, and functional impairment classification: a machine learning study
.
Behav Neurol
.
2017
;
2017
:
1850909
.
28.
O’Bryant
SE
,
Xiao
G
,
Barber
R
,
Huebinger
R
,
Wilhelmsen
K
,
Edwards
M
.
A blood-based screening tool for Alzheimer’s disease that spans serum and plasma: findings from TARC and ADNI
.
PLoS One
.
2011
;
6
(
12
):
e28092
.
29.
Lin
AP
,
Shic
F
,
Enriquez
C
,
Ross
BD
.
Reduced glutamate neurotransmission in patients with Alzheimer’s disease -- an in vivo (13)C magnetic resonance spectroscopy study
.
Magma
.
2003
;
16
(
1
):
29
42
.
30.
Leuzy
A
,
Cullen
NC
,
Mattsson-Carlgren
N
,
Hansson
O
.
Current advances in plasma and cerebrospinal fluid biomarkers in Alzheimer’s disease
.
Curr Opin Neurol
.
2021 Apr 1
34
2
266
74
.
31.
Vickers
AJ
,
van Calster
B
,
Steyerberg
EW
.
A simple, step-by-step guide to interpreting decision curve analysis
.
Diagn Progn Res
.
2019
;
3
:
18
.
You do not currently have access to this content.