Introduction: The aim of this study was to compare various machine learning algorithms for constructing a diabetic retinopathy (DR) prediction model among type 2 diabetes mellitus (DM) patients and to develop a nomogram based on the best model. Methods: This cross-sectional study included DM patients receiving routine DR screening. Patients were randomly divided into training (244) and validation (105) sets. Least absolute shrinkage and selection operator regression was used for the selection of clinical characteristics. Six machine learning algorithms were compared: decision tree (DT), k-nearest neighbours (KNN), logistic regression model (LM), random forest (RF), support vector machine (SVM), and XGBoost (XGB). Model performance was assessed via receiver-operating characteristic (ROC), calibration, and decision curve analyses (DCAs). A nomogram was then developed on the basis of the best model. Results: Compared with the five other machine learning algorithms (DT, KNN, RF, SVM, and XGB), the LM demonstrated the highest area under the ROC curve (AUC, 0.894) and recall (0.92) in the validation set. Additionally, the calibration curves and DCA results were relatively favourable. Disease duration, DPN, insulin dosage, urinary protein, and ALB were included in the LM. The nomogram exhibited robust discrimination (AUC: 0.856 in the training set and 0.868 in the validation set), calibration, and clinical applicability across the two datasets after 1,000 bootstraps. Conclusion: Among the six different machine learning algorithms, the LM algorithm demonstrated the best performance. A logistic regression-based nomogram for predicting DR in type 2 DM patients was established. This nomogram may serve as a valuable tool for DR detection, facilitating timely treatment.

The global prevalence of type 2 diabetes mellitus (DM) has increased dramatically in recent decades, with projections indicating that it will surpass 693 million by 2045 [1]. Diabetic retinopathy (DR) is a prevalent and serious complication of DM that significantly contributes to vision loss among the working-age population. By 2030, an estimated 190 million people are expected to develop DR [2].

DM patients often initially seek care from internal medicine departments. However, some internists may lack vigilance regarding DR, potentially delaying early detection and treatment. This issue is more pronounced in primary health care settings that lack ophthalmologists, retinal specialists, or specialized ophthalmic examination equipment. Consequently, internists need a method to estimate DR risk on the basis of systemic examination data, facilitating early detection and treatment in collaboration with ophthalmologists [3].

Despite numerous studies identifying potential systemic risk factors for DR, convenient and mature models to quantify these factors and assess DR risk in DM patients are lacking [4, 5]. While recent studies have begun to address this gap, they may come with some limitations [6, 7]. On one hand, the clinical characteristics available for screening were limited, restricting the comprehensive evaluation of additional potential systemic risk factors. On the other hand, these studies primarily employed logistic regression models (LMs) without explaining the rationale behind this choice, nor did they compare LM with other machine learning algorithms. Comparing the performance of commonly used machine learning algorithms could help identify a more effective model, thereby enhancing the model’s reliability. Additionally, since these models were derived from different populations and regions, their results may vary [6, 7], highlighting the need for further research to identify commonalities across studies. Therefore, this study aims to compare the performance of different machine learning algorithms and establish a quantitative model to predict DR based on systemic risk factors, thus providing internists with a more effective tool for DR screening.

Participants

This cross-sectional study received approval from the Institutional Review Board of Sun Yat-sen Memorial Hospital (SYSEC-KY-KS-2021-263) and was conducted in compliance with the Declaration of Helsinki. DM patients from the endocrinology inpatient department were included between December 2017 and September 2021. Routine DR screening, which included best-corrected visual acuity, subjective refractive testing, intraocular pressure measurement, axial length assessment, colour fundus photography, and dilated-pupil fundus examination, was performed for these patients. Optical coherence tomography angiography and/or fluorescein fundus angiography were performed. The diagnosis of DR was guided primarily by the 2020 American DR guidelines [3], with assessments based on dilated-pupil fundus examination and colour fundus photography, and optical coherence tomography angiography and/or fluorescein fundus angiography employed as needed. The major inclusion criterion was type 2 DM inpatients with complete medical records. The exclusion criteria included the following: (1) incomplete medical records, defined as any missing data in 92 demographic and systemic characteristics; (2) severe high blood pressure (HBP) (≥180/110 mm Hg) and other severe systemic diseases; (3) renal diseases of nondiabetic origin; (4) retinal conditions that cannot be detected owing to ocular media opacities; (5) high myopia (axial length >26 mm or refractive error >−6 diopters); (6) other retinal diseases, glaucoma, optic neuropathy, or ocular diseases caused by systemic diseases; (7) a history of surgery for retinal diseases or glaucoma; and (8) a history of retinal laser treatment.

All patients were randomly divided at a 7:3 ratio into a training and a validation set. The training set was used to develop models with various machine learning algorithms, whereas the validation set was used to assess the performance of these models.

Clinical Characteristics Data Collection

Ninety-two demographic and systemic characteristics from medical records were collected. These characteristics were closely related or potentially related to DM or DR. They included sex, age, disease duration, random blood glucose, glycosylated haemoglobin, fasting blood glucose, high blood pressure, systolic and diastolic pressure, coronary heart disease, lactate dehydrogenase, phosphocreatine kinase, creatine kinase-MB, stroke history, hyperlipidaemia, triglyceride, total cholesterol, low- and high-density lipoprotein cholesterol, apolipoprotein (Apo) A1, Apo B, Apo E, plaque in the carotid artery, body mass index, body fat rate, basal metabolic rate, and waist-to-hip ratio, tobacco and alcohol consumption, treatment type, insulin dosage, diabetic peripheral neuropathy (DPN), estimated glomerular filtration rate (eGFR), creatinine urea and blood urea nitrogen to creatinine ratio, uric acid, urinary immunoglobulin G urinary transferrin, urinary albumin (ALB), urinary α-l-microglobulin, urinary albumin excretion rate, urinary protein, urinary glucose grade, and urinary protein grade, diabetic foot, diabetic ketoacidosis, free tri-iodothyronine and thyroxine; thyroid-stimulating hormone, anti-thyroglobulin, anti-thyroid peroxidase, N-terminal pro-brain natriuretic peptide, alanine aminotransferase, aspartate transaminase, γ-glutamyl transpeptidase, alkaline phosphatase, direct bilirubin, total bilirubin, and indirect bilirubin, potassium, sodium, calcium, magnesium, phosphorus, chloride, cystatin C, carbon dioxide combining power, β-hydroxybutyrate, prealbumin, total protein, ALB and ALB/globulin ratio, globulin; total bile acid, glycocholic acid, high-sensitive C-reactive protein, retinol binding protein, cholinesterase, leucyl aminopeptidase, α-l-fucosidase, lipase; serum iron, serum amylase, unsaturated iron binding capacity, total iron binding capacity, serum iron saturation serum ferritin, free fatty acids, transferrin, superoxide dismutase, and adenosine deaminase.

The Xiangya equation, which is specifically designed for Chinese patients, was utilized to calculate the eGFR [8]. Furthermore, the diagnosis of DPN was made by endocrinologists and neurologists on the basis of the diagnostic criteria established by the Toronto Diabetic Neuropathy Expert Group [9].

Statistical Analysis and Model Construction

The statistical analyses and model construction and validation were performed via R software (version 4.3.1; R Foundation, Vienna, Austria). The demographic and systemic characteristics of the training and validation sets were compared. Mann-Whitney U tests were applied to nonnormally distributed continuous data, whereas independent t tests were used for normally distributed continuous data. χ2 tests were employed for categorical variables. Statistical significance was defined as p < 0.05.

First, the least absolute shrinkage and selection operator (LASSO) regression model was applied to select the clinical characteristics in the training set. LASSO regression is a statistical method that performs both variable selection and regularization by adding an L1 penalty to the regression coefficients [10]. It is particularly effective for high-dimensional data, especially when dealing with small sample sizes and high collinearity among predictors. This approach helps enhance the model’s prediction accuracy and interpretability.

Second, the selected clinical characteristics were input into six models via different machine learning algorithms. These algorithms include decision tree (DT), k-nearest neighbours (KNN), LM, random forest (RF), support vector machine (SVM), and XGBoost (XGB) models. The receiver-operating characteristic (ROC) curves, calibration curves, decision curve analysis (DCA) results, and performance metrics from the confusion matrix for these models were exported.

Finally, on the basis of the performance of these models, the LM was chosen as the ideal model. A corresponding nomogram was plotted. Additionally, the performance (ROC curves, calibration curves, and DCA results) of this model with 1,000 bootstraps was demonstrated. Bootstrapping is a statistical method used to estimate the accuracy of a model by repeatedly sampling from the original dataset with replacement. The model is trained on these new datasets, and its performance is evaluated on the data not included in each sample (out-of-bag data). This process is repeated many times to get a distribution of performance metrics, providing a robust estimate of the model's reliability and variance. The complete study flowchart is provided in Figure 1.

Fig. 1.

Study flowchart. DM, diabetes mellitus; DR, diabetic retinopathy; LASSO, least absolute shrinkage and selection operator; DT, decision tree; KNN, k-nearest neighbours; LM, logistic regression model; RF, random forest; SVM, support vector machine; ROC, receiver-operating characteristic; DCA, decision curve analysis.

Fig. 1.

Study flowchart. DM, diabetes mellitus; DR, diabetic retinopathy; LASSO, least absolute shrinkage and selection operator; DT, decision tree; KNN, k-nearest neighbours; LM, logistic regression model; RF, random forest; SVM, support vector machine; ROC, receiver-operating characteristic; DCA, decision curve analysis.

Close modal

Participant Characteristics

Initially, 395 patients were included between December 2017 and September 2021. However, 46 patients were excluded due to incomplete records (21), impaired glucose tolerance other than DM (3), nondiabetic renal diseases (2), lymphoma affecting the macular area (1), pituitary diseases affecting the visual field (1), history of panretinal photocoagulation (4), ocular media opacity (3), glaucoma or suspected glaucoma (3), anterior ischaemic optic neuropathy (1), age-related macular degeneration (2), and macular epiretinal membrane (5). Consequently, 349 patients were included in the current study. These patients were then divided into training (244) and validation sets (105).

The demographic and systemic characteristics were generally similar between the two sets, with no significant differences observed in most characteristics. However, the levels of total cholesterol, low- and high-density lipoprotein cholesterol, Apo A1, and eGFR were greater in the training set than in the validation set, whereas the levels of creatinine and alkaline phosphatase were lower (p < 0.05). Additionally, the proportion of DM patients with coronary heart disease was lower in the training set than in the validation set (p < 0.05). Details are provided in online supplementary Table 1 (for all online suppl. material, see https://doi.org/10.1159/000541294).

Selection of Clinical Characteristics

Ninety-two demographic and systemic characteristics were included in the LASSO regression model, with their coefficients and corresponding log(λ) values shown in Figure 2a. In the cross-validated error versus log(λ) plot (Fig. 2b), 9 clinical characteristics with non-zero coefficients were identified at the minimum cross-validated error, and only 6 clinical characteristics with non-zero coefficients (disease duration, DPN, insulin dosage, urinary protein, ALB, and urinary protein grade) were included at the minimum cross-validated error within one standard error of the minimum. To avoid including too many characteristics (overfitting), 6 clinical characteristics were obtained when the value of λ was 0.085. Since the urinary protein level and grade are similar factors, the urinary protein level was chosen because of its greater quantitative precision. Thus, the final selected clinical characteristics were disease duration, DPN, insulin dosage, urinary protein, and ALB.

Fig. 2.

Clinical characteristic selection was performed via the LASSO regression model. a The LASSO coefficient values for 92 variables were plotted against log(λ). b The optimal lambda in the LASSO regression model was determined via tenfold cross-validation. The cross-validated error curve was plotted against log(λ), with the left vertical dotted line indicating the minimum cross-validated error and the right vertical dotted line representing the minimum error within one standard error of the minimum.

Fig. 2.

Clinical characteristic selection was performed via the LASSO regression model. a The LASSO coefficient values for 92 variables were plotted against log(λ). b The optimal lambda in the LASSO regression model was determined via tenfold cross-validation. The cross-validated error curve was plotted against log(λ), with the left vertical dotted line indicating the minimum cross-validated error and the right vertical dotted line representing the minimum error within one standard error of the minimum.

Close modal

Performances of Different Models

All five clinical characteristics were included in the DT, LM, RF, and SVM models, whereas only four (disease duration, DPN, insulin dosage, and urinary protein) were included in the XGB model. Their importance ranks are shown in Table 1 and online supplementary Figure 1. However, the importance ranking of these clinical characteristics is not applicable to the KNN model because of its mechanism.

Table 1.

Logistic model prediction characteristics for DR

characteristicsΒSEOR (95% CI)Zp value
Intercept 0.669 1.428 1.953 (0.119–33.18) 0.469 0.639 
Disease duration, years 0.050 0.024 1.050 (1.002–1.103) 2.037 0.042* 
DPN 1.534 0.335 4.638 (2.432–9.101) 4.576 <0.001* 
Insulin dosage, IU 0.026 0.01 1.026 (1.006–1.047) 2.623 0.009* 
Urinary protein, g/L 1.102 0.371 3.009 (1.604–6.982) 2.973 0.003* 
ALB, g/L −0.078 0.036 0.925 (0.859–0.992) −2.136 0.033* 
characteristicsΒSEOR (95% CI)Zp value
Intercept 0.669 1.428 1.953 (0.119–33.18) 0.469 0.639 
Disease duration, years 0.050 0.024 1.050 (1.002–1.103) 2.037 0.042* 
DPN 1.534 0.335 4.638 (2.432–9.101) 4.576 <0.001* 
Insulin dosage, IU 0.026 0.01 1.026 (1.006–1.047) 2.623 0.009* 
Urinary protein, g/L 1.102 0.371 3.009 (1.604–6.982) 2.973 0.003* 
ALB, g/L −0.078 0.036 0.925 (0.859–0.992) −2.136 0.033* 

β, regression coefficient; SE, standard error; OR, odds ratio; CI, confidence interval; DPN, diabetic peripheral neuropathy; ALB, albumin; *p < 0.05.

The KNN and RF models had the highest area under the ROC curve (AUC, 1,000) in the training set (Fig. 3a), whereas the LM model had the highest AUC (0.894, 95% CI: 0.833–0.955) in the validation set (Fig. 3b). The AUC of the other models in the validation set were as follows: (1) DT: 0.858, 95% CI: 0.785–0.931; (2) KNN: 0.865, 95% CI: 0.793–0.937; (3) RF: 0.870, 95% CI: 0.799–0.941; (4) SVM: 0.893, 95% CI: 0.833–0.953; and (5) XGB: 0.880, 95% CI: 0.814–0.946 (Fig. 3b).

Fig. 3.

Performance comparison of different machine learning models in the training set and validation set. a ROC curves in the training set. b ROC curves in the validation set. c Calibration curves in the training set. d Calibration curves in the validation set. e DCA results in the training set. f DCA results in the validation set. g Performance metrics derived from the confusion matrix for different machine learning models. a, b The dotted lines in the ROC curves represent the reference line. c, d The dotted lines in the calibration curves represent a perfect prediction by an ideal model. e, f The “treat all” lines in the DCA results assume that all patients have DR, whereas the “treat none” lines assume that all patients have non-DR. DT, decision tree; AUC, area under curve; KNN, k-nearest neighbours; LM, logistic regression model; RF, random forest; SVM, support vector machine; XGB, XGBoost.

Fig. 3.

Performance comparison of different machine learning models in the training set and validation set. a ROC curves in the training set. b ROC curves in the validation set. c Calibration curves in the training set. d Calibration curves in the validation set. e DCA results in the training set. f DCA results in the validation set. g Performance metrics derived from the confusion matrix for different machine learning models. a, b The dotted lines in the ROC curves represent the reference line. c, d The dotted lines in the calibration curves represent a perfect prediction by an ideal model. e, f The “treat all” lines in the DCA results assume that all patients have DR, whereas the “treat none” lines assume that all patients have non-DR. DT, decision tree; AUC, area under curve; KNN, k-nearest neighbours; LM, logistic regression model; RF, random forest; SVM, support vector machine; XGB, XGBoost.

Close modal

The DT model showed the best agreement between the observed and predicted outcomes in the training set (Fig. 3c), whereas the LM, DT, and KNN models demonstrated similarly good agreement between the observed and predicted outcomes in the validation set (Fig. 3d). The consistency between the observed and predicted outcomes in the RF, SVM, and XGB models was not stable in the validation set (Fig. 3d).

The KNN and RF models had the best DCA results in the training set (Fig. 3e), whereas in the validation set, almost all the models except SVM showed similar DCA results (Fig. 3f). In the validation set, using the SVM model to predict DR may not gain any net benefit compared to the treat-none or treat-all strategy, while using the other models may add more net benefit than the treat-none or treat-all strategy if the threshold probability is <80%.

The KNN and RF models demonstrated the highest accuracy, precision, recall, and F1 score in the training set. In the validation set, the XGB model achieved the highest accuracy, precision, and F1 score, whereas the LM exhibited the highest recall. Details are provided in Figure 3g.

The performance of the models in the validation set was the major factor in determining the best model. The calibration curves and DCA results in the LM were similar to those of the other models. The LM model was chosen as the ideal model due to its highest AUC and recall (sensitivity) in the validation set, as a high recall is essential for screening.

Nomogram Construction and Application

A nomogram was constructed according to the LM (Fig. 4). After 1,000 bootstraps, the AUC of the LM was 0.856 (95% CI: 0.806–0.902) in the training set and 0.868 (95% CI: 0.800–0.936) in the validation set (Fig. 5a, b). The calibration curves also presented good agreement between the observed and predicted outcomes in both datasets (Fig. 5c, d). According to the DCA, applying the LM nomogram provides greater net benefit than either the treat-none or treat-all strategy when the threshold probability is greater than 10% in the training set and greater than 5% in the validation set (Fig. 5e, f).

Fig. 4.

Nomogram for predicting the possibility of DR in patients with type 2 DM. DPN, diabetic peripheral neuropathy; ALB, albumin; DR, diabetic retinopathy.

Fig. 4.

Nomogram for predicting the possibility of DR in patients with type 2 DM. DPN, diabetic peripheral neuropathy; ALB, albumin; DR, diabetic retinopathy.

Close modal
Fig. 5.

Performance of the logistic regression model with 1,000 bootstraps. a ROC curves in the training set. b ROC curves in the validation set. c Calibration curves in the training set. d Calibration curves in the validation set. e DCA results in the training set. f DCA results in the validation set. a, b The diagonal lines in the ROC curves represent the reference line; the “Apparent ROC” lines represent the apparent performance of the model, whereas the “Bootstrap ROC” lines represent the model’s performance after 1,000 bootstraps. c, d The dotted lines in the calibration curves represent a perfect prediction by an ideal model; the “Apparent” lines denote the apparent performance of the model, whereas the “Bias-corrected” lines reflect the model's performance after 1,000 bootstraps; a closer fit to the dotted lines indicates a better predictive effect. e, f The “treat all” lines in the DCA results assume that all patients have DR, whereas the “treat none” lines assume that all patients have non-DR; the “Nomo model” lines represent the performance of the model. For example, if a patient’s DR threshold probability is 50%, the net benefit is approximately 0.3 in the validation set, meaning that 30 out of 100 patients may benefit from using this model.

Fig. 5.

Performance of the logistic regression model with 1,000 bootstraps. a ROC curves in the training set. b ROC curves in the validation set. c Calibration curves in the training set. d Calibration curves in the validation set. e DCA results in the training set. f DCA results in the validation set. a, b The diagonal lines in the ROC curves represent the reference line; the “Apparent ROC” lines represent the apparent performance of the model, whereas the “Bootstrap ROC” lines represent the model’s performance after 1,000 bootstraps. c, d The dotted lines in the calibration curves represent a perfect prediction by an ideal model; the “Apparent” lines denote the apparent performance of the model, whereas the “Bias-corrected” lines reflect the model's performance after 1,000 bootstraps; a closer fit to the dotted lines indicates a better predictive effect. e, f The “treat all” lines in the DCA results assume that all patients have DR, whereas the “treat none” lines assume that all patients have non-DR; the “Nomo model” lines represent the performance of the model. For example, if a patient’s DR threshold probability is 50%, the net benefit is approximately 0.3 in the validation set, meaning that 30 out of 100 patients may benefit from using this model.

Close modal

Internists can refer to this nomogram to obtain DR risk scores based on systemic examination data. To use this nomogram, a straight line is drawn upwards from the value of each characteristic to the “Points” axis to obtain the corresponding points. After summing these points, locate the total point on the “Total Points” axis. Finally, a straight line is drawn downwards from the total point number to the “DR possibility” axis to determine the corresponding risk of DR. For example, when a patient has a 20-year history of DM, an insulin dosage of 10 IU/day, a urinary protein level of 3 g/L, an ALB level of 50 g/L, and no DPN, he/she receives 15 points for “Disease duration,” 0 point for “DPN,” about 5 points for “Insulin dosage,” 50 points for “Urinary protein,” and 0 point for “ALB.” The total score is 70, corresponding to a DR possibility of 0.8 (80%). Therefore, internists can consider this patient as high risk for DR and refer them to the ophthalmology department for further evaluation.

This study found that regarding the process of establishing a DR prediction model using systemic risk factors, the LM demonstrated the highest AUC and recall (sensitivity) in the validation set compared with five other machine learning algorithms (DT, KNN, RF, SVM, and XGB). Additionally, the calibration curves and DCA results were relatively favourable. Consequently, logistic regression was selected as the ideal model for establishing the prediction model and constructing the nomogram.

Logistic regression is not only a common method in statistics but also a traditional classification algorithm in machine learning. Both SVM and logistic regression are discriminative models. Logistic regression is a linear model, whereas SVM can handle nonlinear problems through kernel functions [11, 12]. The principles of KNN and SVM are based on the distance between samples for classification; however, SVM finds the optimal hyperplane by optimizing an objective function, whereas KNN classifies on the basis of nearest neighbour voting [13]. DT, RF, and XGB are all tree-based algorithms, but RF and XGB improve model performance by integrating multiple DTs. XGB employs gradient boosting, whereas RF uses a simple averaging or voting strategy [14]. Different machine learning algorithms have their own characteristics and suitable applications. Moreover, they have limitations and bias. DT tends to overfit and is sensitive to noise; KNN is computationally intensive and struggles with high-dimensional data; LM is limited to linear relationships and is sensitive to outliers; RF requires significant computational resources and is harder to interpret; SVM is computationally expensive and require careful parameter tuning; XGB is complex, with long training time and extensive hyperparameter tuning needed [11‒14]. Therefore, no single algorithm can be the best choice in all situations. Their performance comparison largely depends on the nature of the dataset and the research objectives. For example, when the primary goal is to analyse the relationship between outcomes and risk factors, especially with smaller datasets, traditional methods such as logistic regression may suffice. However, with larger or more complex datasets (e.g., significant correlations, nonlinearity, outliers, high noise), advanced machine learning algorithms might be more useful.

Before model construction, we used LASSO regression for variable selection. In the occurrence and progression of chronic diseases such as DR, the associated risk variables often exhibit correlations, and multicollinearity can lead to various issues in the model [15]. In particular, in our study, there were 92 clinical characteristics, and the data were high-dimensional, meaning that the number of predictors was much greater than the number of data points. This can pose challenges for traditional statistical methods and models because of issues such as overfitting, multicollinearity, and computational complexity. LASSO regression helps in creating more parsimonious models by retaining only the most significant predictors, thus improving model interpretability and generalizability.

The nomogram constructed in this study, following 1,000 bootstraps, exhibited robust discrimination, calibration, and consistency across datasets. Moreover, DCA indicated a broad range of applicability, showing greater net benefit than traditional prediction methods when the threshold probability exceeds 10% in the training set and 5% in the validation set. Previous studies have also developed nomograms for DR prediction using systemic risk factors [7, 16, 17]. Zhang et al. [7] reported a nomogram with an AUC of 0.79 but did not assess its application range with DCA. Wang et al. [16] presented a nomogram with an AUC of 0.816 in the validation set, which was more beneficial than traditional methods when the threshold probability was less than 85%. Mo et al. [17] described a nomogram with an AUC of 0.715 and a narrower application range (threshold probability of 21–51%). The AUCs of these nomograms were all lower than those of our study, and their DCA ranges were more limited. Additionally, these studies exclusively used logistic regression for modelling without comparing other machine learning algorithms. Thus, our study offers greater innovation and comprehensiveness. Using this nomogram, internists can effectively screen for DR in diabetic patients, allowing timely treatment and improved patient outcomes.

The final model in this study included the following variables: disease duration, insulin dosage, DPN, urinary protein level, and ALB level. Disease duration is a well-established risk factor for DR. Numerous epidemiological studies have shown that the risk of DR increases with the duration of DM [18‒20]. A UK DR screening project [20] noted that among type 2 DM patients without DR at baseline, the cumulative incidence of DR was 36% after 5 years and increased to 66% after 10 years of follow-up. In the Wisconsin Epidemiologic Study of the DR cohort, a 25-year follow-up of DM patients revealed that 97% developed DR, with one-third to one-half of these patients progressing to proliferative DR (42%) or diabetic macular oedema (29%) [19, 21]. A higher insulin dosage is associated with an increased risk of DR. This is partly because patients using insulin generally have poorer pancreatic islet function and less controlled blood glucose levels, increasing their susceptibility to DR. Moreover, intensive insulin therapy can cause rapid reductions in blood glucose levels, which may also contribute to DR development. A meta-analysis revealed that, compared with conventional insulin therapy, intensive insulin therapy increased the risk of DR progression by 2.11 times within 6–12 months [22]. Rapid blood glucose reduction may exacerbate retinal ischaemia, which can induce high levels of exogenous insulin production and promote the secretion of vascular endothelial growth factor, further contributing to DR progression [23]. DPN, diabetic nephropathy, and DR share the same primary mechanism: chronic hyperglycaemia-induced damage to the microvascular endothelium. These diabetic complications often occur in parallel due to similar underlying microvascular damage. Therefore, it is logical to include DPN and markers of renal function (urinary protein level and ALB level) in the model [24, 25]. While only urinary protein and ALB levels were included in the model, other renal function indicators, such as the eGFR and creatinine level, should not be neglected.

The key advantage of our research is that it provides clinicians, especially internists, with a method to quantify DR risk, making it easier to identify patients with DR early and refer them to ophthalmologists, thereby facilitating timely treatment. Our study also has several limitations. First, our data were retrospective, and the number of patients included was limited. Moreover, since all patients were from a single hospital in China, this introduces certain regional and demographic limitations to the model’s applicability. To enhance its generalizability, future research should include large-scale prospective validation across diverse regions and populations, and the algorithms used in the model should be continually updated.

In conclusion, among the six different machine learning algorithms evaluated, the LM demonstrated the best performance in this study. Consequently, a logistic regression-based nomogram for predicting DR in type 2 DM patients was established. This nomogram may serve as a valuable tool for DR detection, facilitating timely treatment.

This study received approval from the Institutional Review Board of Sun Yat-sen Memorial Hospital (SYSEC-KY-KS-2021-263). As this was a retrospective study, the study has been granted an exemption from requiring written informed consent by the Ethics Committee of Sun Yat-sen Memorial Hospital (SYSEC-KY-KS-2021-263). This study complied with the guidelines for human studies. In addition, the study was conducted ethically in accordance with the World Medical Association Declaration of Helsinki.

The authors have no conflicts of interest to declare.

This research is supported by the Project of Administration of Traditional Chinese Medicine of Guangdong Province, China (20221077) and Medical Scientific Research Foundation of Guangdong Province, China (A2024187).

Concept and design; statistical analysis; obtained funding; administrative, technical, or material support; and supervision: Zijing Li. Acquisition, analysis, or interpretation of data; drafting of the manuscript; and critical review of the manuscript for important intellectual content: Weiliang Jiang and Zijing Li.

The data that support the findings of this study are not publicly available due to their containing information that could compromise the privacy of research participants but are available from Zijing Li upon reasonable request.

1.
Cho
NH
,
Shaw
JE
,
Karuranga
S
,
Huang
Y
,
da Rocha Fernandes
JD
,
Ohlrogge
AW
, et al
.
IDF Diabetes Atlas: global estimates of diabetes prevalence for 2017 and projections for 2045
.
Diabetes Res Clin Pract
.
2018
;
138
:
271
81
.
2.
Zheng
Y
,
He
M
,
Congdon
N
.
The worldwide epidemic of diabetic retinopathy
.
Indian J Ophthalmol
.
2012
;
60
(
5
):
428
31
.
3.
Flaxel
CJ
,
Adelman
RA
,
Bailey
ST
,
Fawzi
A
,
Lim
JI
,
Vemulakonda
GA
, et al
.
Diabetic retinopathy preferred practice Pattern®
.
Ophthalmology
.
2020
;
127
(
1
):
P66
145
.
4.
Yau
JW
,
Rogers
SL
,
Kawasaki
R
,
Lamoureux
EL
,
Kowalski
JW
,
Bek
T
, et al
.
Global prevalence and major risk factors of diabetic retinopathy
.
Diabetes Care
.
2012
;
35
(
3
):
556
64
.
5.
Lin
KY
,
Hsih
WH
,
Lin
YB
,
Wen
CY
,
Chang
TJ
.
Update in the epidemiology, risk factors, screening, and treatment of diabetic retinopathy
.
J Diabetes Investig
.
2021
;
12
(
8
):
1322
5
.
6.
Yang
H
,
Xia
M
,
Liu
Z
,
Xing
Y
,
Zhao
W
,
Li
Y
, et al
.
Nomogram for prediction of diabetic retinopathy in patients with type 2 diabetes mellitus: a retrospective study
.
J Diabetes Complications
.
2022
;
36
(
11
):
108313
.
7.
Zhang
C
,
Zhou
L
,
Ma
M
,
Yang
Y
,
Zhang
Y
,
Zha
X
.
Dynamic nomogram prediction model for diabetic retinopathy in patients with type 2 diabetes mellitus
.
BMC Ophthalmol
.
2023
;
23
(
1
):
186
.
8.
Li
DY
,
Yin
WJ
,
Yi
YH
,
Zhang
BK
,
Zhao
J
,
Zhu
CN
, et al
.
Development and validation of a more accurate estimating equation for glomerular filtration rate in a Chinese population
.
Kidney Int
.
2019
;
95
(
3
):
636
46
.
9.
Tesfaye
S
,
Boulton
AJ
,
Dyck
PJ
,
Freeman
R
,
Horowitz
M
,
Kempler
P
, et al
.
Diabetic neuropathies: update on definitions, diagnostic criteria, estimation of severity, and treatments
.
Diabetes Care
.
2010
;
33
(
10
):
2285
93
.
10.
Dai
P
,
Chang
W
,
Xin
Z
,
Cheng
H
,
Ouyang
W
,
Luo
A
.
Retrospective study on the influencing factors and prediction of hospitalization expenses for chronic renal failure in China based on random forest and LASSO regression
.
Front Public Health
.
2021
;
9
:
678276
.
11.
Sharma
T
,
Shah
M
.
A comprehensive review of machine learning techniques on diabetes detection
.
Vis Comput Ind Biomed Art
.
2021
;
4
(
1
):
30
.
12.
Rodríguez-Pérez
R
,
Bajorath
J
.
Evolution of support vector machine and regression modeling in chemoinformatics and drug discovery
.
J Comput Aided Mol Des
.
2022
;
36
(
5
):
355
62
.
13.
Murugan
A
,
Nair
SAH
,
Kumar
KPS
.
Detection of skin cancer using SVM, random forest and kNN classifiers
.
J Med Syst
.
2019
;
43
(
8
):
269
.
14.
Zhang
H
,
Zhang
H
,
Yang
H
,
Shuid
AN
,
Sandai
D
,
Chen
X
.
Machine learning-based integrated identification of predictive combined diagnostic biomarkers for endometriosis
.
Front Genet
.
2023
;
14
:
1290036
.
15.
Mao
Y
,
Zhu
Z
,
Pan
S
,
Lin
W
,
Liang
J
,
Huang
H
, et al
.
Value of machine learning algorithms for predicting diabetes risk: a subset analysis from a real-world retrospective cohort study
.
J Diabetes Investig
.
2023
;
14
(
2
):
309
20
.
16.
Wang
Q
,
Zeng
N
,
Tang
H
,
Yang
X
,
Yao
Q
,
Zhang
L
, et al
.
Diabetic retinopathy risk prediction in patients with type 2 diabetes mellitus using a nomogram model
.
Front Endocrinol
.
2022
;
13
:
993423
.
17.
Mo
R
,
Shi
R
,
Hu
Y
,
Hu
F
.
Nomogram-based prediction of the risk of diabetic retinopathy: a retrospective study
.
J Diabetes Res
.
2020
;
2020
:
7261047
.
18.
Graue-Hernandez
EO
,
Rivera-De-La-Parra
D
,
Hernandez-Jimenez
S
,
Aguilar-Salinas
CA
,
Kershenobich-Stalnikowitz
D
,
Jimenez-Corona
A
.
Prevalence and associated risk factors of diabetic retinopathy and macular oedema in patients recently diagnosed with type 2 diabetes
.
BMJ Open Ophthalmol
.
2020
;
5
(
1
):
e000304
.
19.
Klein
R
,
Knudtson
MD
,
Lee
KE
,
Gangnon
R
,
Klein
BEK
.
The Wisconsin Epidemiologic Study of Diabetic Retinopathy: XXII the twenty-five-year progression of retinopathy in persons with type 1 diabetes
.
Ophthalmology
.
2008
;
115
(
11
):
1859
68
.
20.
Jones
CD
,
Greenwood
RH
,
Misra
A
,
Bachmann
MO
.
Incidence and progression of diabetic retinopathy during 17 years of a population-based screening program in England
.
Diabetes Care
.
2012
;
35
(
3
):
592
6
.
21.
Klein
R
,
Knudtson
MD
,
Lee
KE
,
Gangnon
R
,
Klein
BEK
.
The Wisconsin Epidemiologic Study of Diabetic Retinopathy XXIII: the twenty-five-year incidence of macular edema in persons with type 1 diabetes
.
Ophthalmology
.
2009
;
116
(
3
):
497
503
.
22.
Wang
PH
,
Lau
J
,
Chalmers
TC
.
Metaanalysis of the effects of intensive glycemic control on late complications of type I diabetes mellitus
.
Online J Curr Clin Trials
.
1993
;Doc No 60:[5023 words; 5037 paragraphs].
23.
Jingi
AM
,
Tankeu
AT
,
Ateba
NA
,
Noubiap
JJ
.
Mechanism of worsening diabetic retinopathy with rapid lowering of blood glucose: the synergistic hypothesis
.
BMC Endocr Disord
.
2017
;
17
(
1
):
63
.
24.
Rasheed
R
,
Pillai
GS
,
Kumar
H
,
Shajan
AT
,
Radhakrishnan
N
,
Ravindran
GC
.
Relationship between diabetic retinopathy and diabetic peripheral neuropathy: neurodegenerative and microvascular changes
.
Indian J Ophthalmol
.
2021
;
69
(
11
):
3370
5
.
25.
Yang
J
,
Liu
Z
.
Mechanistic pathogenesis of endothelial dysfunction in diabetic nephropathy and retinopathy
.
Front Endocrinol
.
2022
;
13
:
816400
.