Abstract
Introduction: This study aimed to assess the long-term renal prognosis of patients with hypertensive nephropathy (HN) diagnosed through renal biopsy, utilizing the random survival forest (RSF) algorithm. Methods: From December 2010 to December 2022, HN patients diagnosed by renal biopsy in Xijing Hospital were enrolled and randomly divided into training set and testing set at a ratio of 7∶3. The study’s composite endpoint was defined as a ≥50% decline in estimated glomerular filtration rate (eGFR), end-stage renal disease, or death. RSF and Cox regression were used to establish a renal prognosis prediction model based on the factors screened by the RSF algorithm. The Concordance index (C-index), integrated Brier score, net reclassification improvement (NRI), and integrated discrimination improvement (IDI) were used to evaluate discrimination, calibration, and risk classification, respectively. Results: A total of 225 patients were included in this study, with 72 (32.0%) patients experiencing combined events after a median follow-up of 29.9 (16.6, 52.1) months. Six eligible variables (overall chronicity grade of renal pathology, eGFR, high-density lipoprotein cholesterol, hematocrit, monocyte, and stroke volume) were selected from clinical data and introduced into the RSF model. The RSF model had a higher C-index in both the training set (0.904 [95% CI: 0.842–0.938] vs. 0.831 [95% CI: 0.768–0.894], p < 0.001) and the testing set (0.893 [95% CI: 0.770–0.944] vs. 0.841 [95% CI: 0.751–0.931], p = 0.021) compared to the Cox model. NRI and IDI indicated that the RSF model outperformed the Cox model regarding risk classification. Conclusion: In this study, the RSF algorithm was employed to identify the risk factors affecting the prognosis of HN patients, and a clinical prognostic RSF model was constructed to predict the adverse outcomes of HN patients based on renal pathology. Compared to the traditional Cox regression model, the RSF model offers superior performance and can provide valuable new insights for clinical diagnosis and treatment strategies.
Introduction
Hypertension is one of the leading risk factors globally for attributable deaths according to the Global Burden of Disease [1]. Among the target organs affected by hypertension, the kidneys are particularly susceptible [2]. In recent years, the prevalence of hypertensive nephropathy (HN) has gradually increased worldwide. In the USA, the prevalence of chronic kidney disease (CKD) has increased significantly among hypertension adults, with HN now the second leading cause of end-stage renal disease (ESRD) [3]. It is also the second most common cause of renal replacement therapy in Europe [4]. Data from the China Kidney Disease Network (CK-NET2016) showed that HN accounted for 20.78% of hospitalized patients with CKD in China [5]. Given the aging population and improved survival rates for cardiovascular diseases, it is anticipated that the incidence of HN and its associated ESRD will continue to rise in the coming decades [2, 6].
As an interdisciplinary problem, the diagnosis of HN is mainly based on clinical experience. In the absence of a clear medical history, it is difficult to establish a causal relationship between hypertension and CKD [7]. Therefore, a clinical cohort study based on renal biopsy is essential for the prognostic evaluation of HN, which remains a significant gap in current researches [8]. Current evidence suggests that several key factors may impact the prognosis of HN, including urinary protein excretion, baseline renal function, and blood pressure control status [9‒12]. However, no consistent conclusion has been reached. At the same time, large cohort studies (MDRD [13], AASK [14], REIN-2 [15]) have shown that strict blood pressure control or urinary protein control has not been consistently shown to provide a definite renoprotective effect. Additionally, current model analyses focus on the diagnosis of HN, while there is a lack of comprehensive clinical guidance on the prognosis of HN [16].
The present study was designed to investigate the risk factors that impact the prognosis of patients with HN by conducting a comprehensive analysis of renal outcomes in a cohort of patients diagnosed via renal biopsy. The random survival forest (RSF) algorithm was utilized to develop and validate a prognostic model for predicting the renal outcome of HN patients. The prediction accuracy of this model was compared with the benchmark Cox regression model, providing valuable insights for early disease intervention and clinical decision-making.
Materials and Methods
Study Population
We retrospectively analyzed the pathological and clinical data of patients with HN diagnosed by renal biopsy in Xijing Hospital of Air Force Medical University from December 2010 to December 2022. The inclusion criteria were as follows: (1) age ≥18 years old; (2) systolic blood pressure ≥140 mm Hg and (or) diastolic blood pressure ≥90 mm Hg for three consecutive days without antihypertensive drugs if the medical history and clinical manifestations meet the diagnostic criteria of essential hypertension; (3) 24-h urinary microalbumin ≥300 mg/24 h; (4) renal biopsy results revealed the presence of HN or hypertensive nephrosclerosis. The exclusion criteria were as follows: (1) secondary hypertension caused by Cushing’s syndrome, pheochromocytoma, primary aldosteronism, or renal vascular disease; (2) combined with other pathological types of diseases; (3) fewer than eight glomeruli observed in the biopsy specimens or absence of original pathological reports; (4) the coexistence of other major diseases, such as malignant tumors, severe trauma, profound infections, and the like, with a projected survival duration of fewer than 1 year. This study adhered to the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) guidelines (online suppl. Table S1; for all online suppl. material, see https://doi.org/10.1159/000545524) [17]. The Ethics Committee of Xijing Hospital approved this study (ethical number: KY20213027-1). As the study was retrospective, the Ethics Committee waived the requirement for informed consent from eligible patients.
Outcomes
The primary endpoint used in this study was the composite endpoint of estimated glomerular filtration rate (eGFR) decline ≥50%, ESRD, or death, whichever occurred first. The eGFR was calculated using the Chronic Kidney Disease Epidemiology Collaboration (CKD-EPI) equation. ESRD is defined as eGFR less than 15 mL/min/1.73 m2, requiring initiation of chronic dialysis (hemodialysis or peritoneal dialysis) or kidney transplantation [18].
Clinical Variables
Two researchers (Qin and Zhao) collected the baseline data from the electronic medical record information system, including demographic, clinical, and pathological characteristics. Mean arterial pressure (MAP) was calculated based on admission blood pressure measurements, and the highest MAP value and hypertension course from the patient’s past medical history were documented. The comorbidity of hypertensive heart disease, fatty liver, diabetes, and other diseases was recorded according to the patient’s diagnostic results. Detailed antihypertensive treatment information was also collected. Patients with a follow-up duration exceeding 6 months were considered successful cases unless they met the endpoints. Follow-up included assessment of survival status, progression to ESRD or dialysis, and laboratory examination data. The privacy of the subject’s information was strictly protected during and after data collection.
Renal Pathological Assessment
The pathological characteristics were reviewed by experienced pathologists and classified using the overall chronicity grade of renal pathology [19]. The histological chronicity score encompassed four components: global and segmental glomerulosclerosis, tubular atrophy (TA), interstitial fibrosis (IF), and arteriosclerosis lesion (CV). Each component was assigned a score of 0–3 based on the percentage of glomeruli or cortical tubulointerstitial area involved. CV was assigned a score of 0 if the intimal thickness of arteries was less than that of media and 1 otherwise. The individual scores of the four components were summed to obtain the overall chronicity score, which ranged from minimal (0–1), mild (2–4), moderate (5–7), to severe (8–10).
Model Development and Validation
We used the “randomForestSRC” R package to construct an RSF model to predict the clinical outcome of HN patients. Bootstrapping resampling with 1,000 replications was conducted, and factors with a mean variable importance exceeding 0.01 were included. The grid search method was utilized for hyperparameter optimization. We also employed the “rms” R package using the same factors as benchmarks to construct Cox regression models to validate the Superiority-Inferiority of the RSF model. We tested the proportional risk assumption before conducting the multivariate Cox regression analysis to ensure the model’s accuracy. The performance of the model was evaluated based on discrimination and calibration metrics.
Discrimination was quantified using the area under the receiver operating characteristic curve (AUC) and Concordance index (C-index). To assess the model's discrimination ability over the entire study period, we calculated the area under the time-dependent curve (tAUC) [20]. Calibration plots were constructed to evaluate a model’s predicted and observed probabilities. Furthermore, we calculated the efficacy of the integrated Brier score and decision curve analysis (DCA) to evaluate model performance. The integrated discrimination improvement (IDI) and net reclassification improvement (NRI) were calculated to evaluate risk reclassification. A higher NRI value indicated that the RSF model outperformed the Cox model in reducing the risk of misclassifying individuals, while a higher IDI value suggested a more significant improvement of the RSF model [21].
Statistical Analysis
Variables with missing data exceeding 10% and a correlation coefficient greater than 0.75 were excluded. Random forest imputation was used to impute missing data. Categorical variables were described using frequencies and percentages, and comparisons were performed using the χ2 test or Fisher’s exact test. Normally distributed continuous variables were presented as the mean ± standard deviation, while nonnormally distributed continuous variables were expressed as the median (interquartile ranges). Comparisons between groups were conducted using the Student’s t test or the Mann-Whitney U test. Kaplan-Meier curves were used to describe the cumulative survival of the patients and were compared between groups using the log-rank test. p values were calculated using two-tailed tests, and a significance level of p < 0.05 was considered statistically significant. All statistical analyses were performed using SPSS (Version 26.0 IBM), R software (Version 4.2.3), and Jamovi (Version 2.3, Australia).
Results
Patient Characteristics
A total of 225 eligible HN patients were included from 341 identified patients among 9,682 renal biopsy patients (Fig. 1). During a median follow-up of 29.9 (16.6, 52.1) months, 72 (32.0%) patients achieved the study endpoint. Specifically, the number of patients experiencing an eGFR decline ≥ 50%, ESRD, and death were 43 (59.4%), 22 (30.5%), and 7 (9.7%), respectively. And 3 of these patients reached the endpoint within the first 6 months of follow-up. The MAP was 106.3 (94.3, 119.3) mm Hg, the urinary total protein was 992.0 (548.5, 1,620.0) mg/24 h, and the eGFR was 49.2 (31.0, 68.3) mL/min/1.73 m2 (Table 1). All patients were randomly split into 157 training samples and 68 testing samples. The clinical and pathological variables and follow-up were well balanced between the training and testing sets, indicating a robust study design.
Inclusion flowchart. HN, hypertensive nephropathy. RSF, random survival forest.
Baseline and follow-up characteristics of the study subjects
Characteristics . | All patients (N = 225) . | Training set (N = 157) . | Testing set (N = 68) . | p value . |
---|---|---|---|---|
Baseline (at renal biopsy) | ||||
Age, years | 44.8±12.7 | 44.2±12.4 | 46.29±13.3 | 0.416 |
Male, n (%) | 178 (79.11) | 127 (80.9) | 51 (75.0) | 0.318 |
Family history, n (%) | 92 (40.9) | 63 (40.1) | 29 (42.6) | 0.724 |
Smoking history, n (%) | 72 (32.0) | 47 (29.9) | 25 (36.8) | 0.313 |
Body mass index, kg/m2 | 26.82±4.06 | 26.85±3.93 | 26.75±4.35 | 0.868 |
MAP, mm Hg | 106.3 (94.3, 119.3) | 106.7 (94.7, 118.2) | 104.7 (92.8, 122.3) | 0.944 |
MaxMAP, mm Hg | 136.7 (126.7, 153.3) | 133.3 (126.7, 153.3) | 140.0 (128.8, 155.8) | 0.277 |
Course of disease, years | 5.0 (2.0, 10.0) | 5.0 (2.0, 9.0) | 5.0 (1.3, 10.0) | 0.315 |
UTP, mg/24 h | 992.0 (548.5, 1,620.0) | 938.0 (523.5, 1,593.5) | 1,246.0 (552.8, 1,737.3) | 0.335 |
WBC, 109/L | 7.52 (5.92, 8.71) | 7.46 (5.92, 8.79) | 7.55 (5.94, 8.71) | 0.997 |
HCT | 0.426 (0.376, 0.469) | 0.432 (0.372, 0.472) | 0.412 (0.373, 0.463) | 0.287 |
PLT, 109/L | 219 (184, 271) | 219 (179, 269) | 220 (187, 274) | 0.761 |
MONO | 0.061 (0.054, 0.077) | 0.061 (0.054, 0.078) | 0.063 (0.053, 0.076) | 0.599 |
UNAG, mg/dL | 11.42 (5.75, 16.90) | 11.6 (5.7, 17.5) | 10.0 (5.9, 18.9) | 0.835 |
UAlb/Cr | 210.6 (55.4, 569.6) | 197.1 (52.3, 510.6) | 302.5 (78.2, 887.0) | 0.073 |
ALT, IU/L | 22 (16, 32.5) | 23 (16, 33) | 22 (14, 32) | 0.518 |
AST, IU/L | 19 (16, 26) | 20 (16, 26) | 19 (16, 25) | 0.347 |
UA, umol/L | 399 (332, 487) | 390 (325, 475) | 431 (340, 508) | 0.249 |
eGFR, mL/min/1.73 m2 | 49.2 (31.0, 68.3) | 52.6 (32.5, 72.2) | 43.7 (25.2, 59.3) | 0.051 |
Alb, g/L | 42.3±5.5 | 42.7±5.6 | 41.6±5.10.416 | 0.197 |
K, mmol/L | 4.08 (3.72, 4.45) | 4.09 (3.73, 4.45) | 4.05 (3.68, 4.50) | 0.932 |
Na, mmol/L | 141.9±2.8 | 142.1±2.6 | 141.4±3.1 | 0.120 |
TG, mmol/L | 2.09 (1.40, 2.87) | 2.09 (1.40, 2.83) | 2.09 (1.39, 3.21) | 0.652 |
HDL-C, mmol/L | 1.01 (0.83, 1.19) | 1.00 (0.83, 1.18) | 1.01 (0.84, 1.20) | 0.661 |
LDL-C, mmol/L | 2.61 (2.08, 3.23) | 2.63 (2.06, 3.21) | 2.57 (2.11, 3.44) | 0.454 |
IVST, mm | 11.6±2.1 | 11.6±2.0 | 11.7±1.96 | 0.673 |
LVEF, % | 0.56 (0.54, 0.58) | 0.56 (0.54, 0.58) | 0.56 (0.54, 0.49) | 0.900 |
SV, mL | 57 (49, 65) | 57 (48, 65) | 56 (50, 65) | 0.885 |
Histological lesion scoring, n (%) | 0.146 | |||
Minimal chronic changes (0–1) | 38 (16.9) | 31 (19.7) | 8 (11.8) | |
Mild chronic changes (2–4) | 59 (26.2) | 44 (28.0) | 14 (20.6) | |
Moderate chronic changes (5–7) | 72 (32.0) | 52 (33.1) | 26 (38.2) | |
Severe chronic changes (≥8) | 56 (24.9) | 30 (19.1) | 20 (29.4) | |
Onion skin changes, n (%) | 73 (32.4) | 45 (28.7) | 28 (41.2) | 0.066 |
Comorbidity, n (%) | ||||
Hypertensive heart disease | 60 (26.7) | 40 (25.5) | 20 (29.4) | 0.540 |
Diabetes | 43 (19.1) | 28 (17.8) | 15 (22.1) | 0.459 |
Fatty liver | 45 (20.0) | 32 (20.4) | 13 (19.1) | 0.828 |
First-line treatment, n (%) | ||||
RAASi | 183 (81.3) | 132 (84.1) | 51 (75.0) | 0.109 |
CCB | 168 (74.7) | 118 (75.2) | 50 (73.5) | 0.796 |
Diuretic | 31 (13.8) | 24 (15.3) | 7 (10.3) | 0.318 |
α- blockers | 103 (45.8) | 68 (43.3) | 35 (51.5) | 0.259 |
β-blocker | 77 (34.2) | 51 (32.5) | 26 (38.2) | 0.404 |
Lipid-lowering drug | 102 (45.3) | 67 (42.7) | 35 (51.5) | 0.224 |
Follow-up parameters | ||||
Follow-up time, months | 29.9 (16.6, 52.1) | 31.3 (17.1, 53.1) | 27.3 (14.9, 51.4) | 0.334 |
Primary endpoint, n (%) | 72 (32.0) | 47 (29.9) | 25 (36.8) | 0.313 |
Characteristics . | All patients (N = 225) . | Training set (N = 157) . | Testing set (N = 68) . | p value . |
---|---|---|---|---|
Baseline (at renal biopsy) | ||||
Age, years | 44.8±12.7 | 44.2±12.4 | 46.29±13.3 | 0.416 |
Male, n (%) | 178 (79.11) | 127 (80.9) | 51 (75.0) | 0.318 |
Family history, n (%) | 92 (40.9) | 63 (40.1) | 29 (42.6) | 0.724 |
Smoking history, n (%) | 72 (32.0) | 47 (29.9) | 25 (36.8) | 0.313 |
Body mass index, kg/m2 | 26.82±4.06 | 26.85±3.93 | 26.75±4.35 | 0.868 |
MAP, mm Hg | 106.3 (94.3, 119.3) | 106.7 (94.7, 118.2) | 104.7 (92.8, 122.3) | 0.944 |
MaxMAP, mm Hg | 136.7 (126.7, 153.3) | 133.3 (126.7, 153.3) | 140.0 (128.8, 155.8) | 0.277 |
Course of disease, years | 5.0 (2.0, 10.0) | 5.0 (2.0, 9.0) | 5.0 (1.3, 10.0) | 0.315 |
UTP, mg/24 h | 992.0 (548.5, 1,620.0) | 938.0 (523.5, 1,593.5) | 1,246.0 (552.8, 1,737.3) | 0.335 |
WBC, 109/L | 7.52 (5.92, 8.71) | 7.46 (5.92, 8.79) | 7.55 (5.94, 8.71) | 0.997 |
HCT | 0.426 (0.376, 0.469) | 0.432 (0.372, 0.472) | 0.412 (0.373, 0.463) | 0.287 |
PLT, 109/L | 219 (184, 271) | 219 (179, 269) | 220 (187, 274) | 0.761 |
MONO | 0.061 (0.054, 0.077) | 0.061 (0.054, 0.078) | 0.063 (0.053, 0.076) | 0.599 |
UNAG, mg/dL | 11.42 (5.75, 16.90) | 11.6 (5.7, 17.5) | 10.0 (5.9, 18.9) | 0.835 |
UAlb/Cr | 210.6 (55.4, 569.6) | 197.1 (52.3, 510.6) | 302.5 (78.2, 887.0) | 0.073 |
ALT, IU/L | 22 (16, 32.5) | 23 (16, 33) | 22 (14, 32) | 0.518 |
AST, IU/L | 19 (16, 26) | 20 (16, 26) | 19 (16, 25) | 0.347 |
UA, umol/L | 399 (332, 487) | 390 (325, 475) | 431 (340, 508) | 0.249 |
eGFR, mL/min/1.73 m2 | 49.2 (31.0, 68.3) | 52.6 (32.5, 72.2) | 43.7 (25.2, 59.3) | 0.051 |
Alb, g/L | 42.3±5.5 | 42.7±5.6 | 41.6±5.10.416 | 0.197 |
K, mmol/L | 4.08 (3.72, 4.45) | 4.09 (3.73, 4.45) | 4.05 (3.68, 4.50) | 0.932 |
Na, mmol/L | 141.9±2.8 | 142.1±2.6 | 141.4±3.1 | 0.120 |
TG, mmol/L | 2.09 (1.40, 2.87) | 2.09 (1.40, 2.83) | 2.09 (1.39, 3.21) | 0.652 |
HDL-C, mmol/L | 1.01 (0.83, 1.19) | 1.00 (0.83, 1.18) | 1.01 (0.84, 1.20) | 0.661 |
LDL-C, mmol/L | 2.61 (2.08, 3.23) | 2.63 (2.06, 3.21) | 2.57 (2.11, 3.44) | 0.454 |
IVST, mm | 11.6±2.1 | 11.6±2.0 | 11.7±1.96 | 0.673 |
LVEF, % | 0.56 (0.54, 0.58) | 0.56 (0.54, 0.58) | 0.56 (0.54, 0.49) | 0.900 |
SV, mL | 57 (49, 65) | 57 (48, 65) | 56 (50, 65) | 0.885 |
Histological lesion scoring, n (%) | 0.146 | |||
Minimal chronic changes (0–1) | 38 (16.9) | 31 (19.7) | 8 (11.8) | |
Mild chronic changes (2–4) | 59 (26.2) | 44 (28.0) | 14 (20.6) | |
Moderate chronic changes (5–7) | 72 (32.0) | 52 (33.1) | 26 (38.2) | |
Severe chronic changes (≥8) | 56 (24.9) | 30 (19.1) | 20 (29.4) | |
Onion skin changes, n (%) | 73 (32.4) | 45 (28.7) | 28 (41.2) | 0.066 |
Comorbidity, n (%) | ||||
Hypertensive heart disease | 60 (26.7) | 40 (25.5) | 20 (29.4) | 0.540 |
Diabetes | 43 (19.1) | 28 (17.8) | 15 (22.1) | 0.459 |
Fatty liver | 45 (20.0) | 32 (20.4) | 13 (19.1) | 0.828 |
First-line treatment, n (%) | ||||
RAASi | 183 (81.3) | 132 (84.1) | 51 (75.0) | 0.109 |
CCB | 168 (74.7) | 118 (75.2) | 50 (73.5) | 0.796 |
Diuretic | 31 (13.8) | 24 (15.3) | 7 (10.3) | 0.318 |
α- blockers | 103 (45.8) | 68 (43.3) | 35 (51.5) | 0.259 |
β-blocker | 77 (34.2) | 51 (32.5) | 26 (38.2) | 0.404 |
Lipid-lowering drug | 102 (45.3) | 67 (42.7) | 35 (51.5) | 0.224 |
Follow-up parameters | ||||
Follow-up time, months | 29.9 (16.6, 52.1) | 31.3 (17.1, 53.1) | 27.3 (14.9, 51.4) | 0.334 |
Primary endpoint, n (%) | 72 (32.0) | 47 (29.9) | 25 (36.8) | 0.313 |
Values are presented as the mean ± standard deviation, median (interquartile range), or n (%).
MAP, mean arterial pressure; maxMAP, highest previous MAP; UTP, urine total protein; WBC, white blood cell; HCT, hematocrit; PLT, platelet; MONO, monocyte; UNAG, N-acetyl-β-d-glucosaminidase; UAlb/Cr, urinary albumin/creatinine; ALT, alanine aminotransferase; AST, aspartate aminotransferase; UA, uric acid; eGFR, estimated glomerular filtration rate; Alb, albumin; TG, triglyceride; HDL-C, high-density lipoprotein cholesterol; LDL-C, low-density lipoprotein cholesterol; IVST, interventricular septum thickness; LVEF, left ventricular ejection fraction; SV, stroke volume; RAASi, renin-angiotensin-aldosterone system inhibitors; CCB, calcium channel blockers.
Screening of Variables
The missing data were addressed using the “missforest” package. After excluding collinearity data, 98 covariates were selected from 112 variables for analysis (online suppl. Table S2). After bootstrapping resampling with 1,000 replications in the RSF (online suppl. Fig. S1), six eligible variables (overall chronicity grade of renal pathology [grade], eGFR, high-density lipoprotein cholesterol [HDL-C], hematocrit [HCT], monocyte, and stroke volume [SV]) were selected from clinical data (online suppl. Table S3). Notably, the grade emerged as the variable with the highest contribution among all considered factors (online suppl. Fig. S2). Meanwhile, we further evaluated and verified the importance of pathological indicators in prognostic analysis by establishing two Cox models with and without pathological indicators (online suppl. Fig. S3).
Model Development and Validation
RSF Model Development and Internal Validation
The selected variables were used to establish the RSF and Cox models in the training and testing sets, respectively. Beyond 800 trees, the error rate stabilizes, resulting in more reliable outcomes. According to grid search, the optimized mtry and nodesize were 1 and 8, respectively. In the training set, the RSF model exhibited AUCs of 0.972 (95% CI: 0.943–1), 0.930 (95% CI: 0.880–0.980), and 0.952 (95% CI: 0.903–1) at 12, 36, and 60 months, respectively, outperforming the Cox model’s AUCs of 0.924 (95% CI: 0.838–1), 0.858 (95% CI: 0.779–0.938), and 0.881 (95% CI: 0.797–0.965) at the corresponding time points (online suppl. Fig. S4). In the testing set, the RSF model also demonstrated excellent discriminatory ability, with AUC values of 0.969 (95% CI: 0.928–1.00), 0.876 (95% CI: 0.759–0.993), and 0.939 (95% CI: 0.859–1) at 12, 36, and 60 months, respectively. In contrast, the Cox model exhibited AUC values of 0.970 (95% CI: 0.929–1.00), 0.814 (95% CI: 0.667–0.962), and 0.825 (95% CI: 0.675–0.975), respectively (Fig. 2a–c). After 12 months, tAUC was consistently higher in the RSF model than in the Cox model (Fig. 2d). The RSF model had the higher C-index both in the training (0.904 [95% CI: 0.842–0.938] vs. 0.831 [95% CI: 0.768–0.894], p < 0.001) and testing set with (0.893 [95% CI: 0.770–0.944] vs. 0.841 [95% CI: 0.751–0.931], p = 0.021) (Table 2).
Comparison of AUC between the RSF model and Cox model in the testing cohort. a AUC at 12 months. b AUC at 36 months. c AUC at 60 months. d Comparison of tAUC between the two models. RSF, random survival forests. Cox, cox regression model. AUC, the area under the curve; tAUC, time-independent area under curve.
Comparison of AUC between the RSF model and Cox model in the testing cohort. a AUC at 12 months. b AUC at 36 months. c AUC at 60 months. d Comparison of tAUC between the two models. RSF, random survival forests. Cox, cox regression model. AUC, the area under the curve; tAUC, time-independent area under curve.
The RSF and Cox models for predicting the occurrence of composite endpoint
Models . | Training set . | Testing set . | ||
---|---|---|---|---|
C-index . | IBS . | C-index . | IBS . | |
RSF model | 0.904 (0.842, 0.938) | 0.081 (0.080, 0.083) | 0.893 (0.770, 0.944) | 0.078 (0.072, 0.084) |
Cox model | 0.831 (0.768, 0.894) | 0.113 (0.081, 0.158) | 0.841 (0.751, 0.931) | 0.105 (0.091, 0.156) |
p value | <0.001 | - | 0.021 | - |
Models . | Training set . | Testing set . | ||
---|---|---|---|---|
C-index . | IBS . | C-index . | IBS . | |
RSF model | 0.904 (0.842, 0.938) | 0.081 (0.080, 0.083) | 0.893 (0.770, 0.944) | 0.078 (0.072, 0.084) |
Cox model | 0.831 (0.768, 0.894) | 0.113 (0.081, 0.158) | 0.841 (0.751, 0.931) | 0.105 (0.091, 0.156) |
p value | <0.001 | - | 0.021 | - |
IBS, integrated Brier score; RSF, random survival forest; Cox, Cox regression; C-index, Concordance index.
Calibration of the Models
The IBS was used to assess the calibration of RSF and Cox models. All models showed good calibration with low IBS in the training and testing set (Table 2). In the testing set, the IBS of the RSF model was 0.078 (95% CI: 0.072–0.084), and the Cox model was 0.105 (95% CI: 0.091–0.156). In addition, smoothed calibration plots showed that the observed and predicted risks are generally consistent throughout the entire spectrum of predicted risks, although there are some slight over-predictions (Fig. 3).
The calibration curve of prediction models in the testing cohort. a Calibration curve at 12 months. b Calibration curve at 36 months. c Calibration curve at 60 months. RSF, random survival forest; Cox, Cox regression model.
The calibration curve of prediction models in the testing cohort. a Calibration curve at 12 months. b Calibration curve at 36 months. c Calibration curve at 60 months. RSF, random survival forest; Cox, Cox regression model.
Comparison of Model Prediction Efficiency
All data were integrated to validate the superiority-inferiority of the RSF and Cox models, and NRI and IDI between models were calculated to determine the model’s predictive performance. Compared with the Cox model, the categorical NRI of the RSF model for the 12, 36, and 60 months occurrence of combined events was 0.867 (95% CI: 0.547–0.939, p < 0.01), 0.478 (95% CI: 0.307–0.656, p < 0.01) and 0.489 (95% CI: 0.230–0.678, p = 0.004). The IDI was 0.213 (95% CI: 0.088–0.331, p < 0.01), 0.177 (95% CI: 0.112–0.245, p < 0.01), and 0.127 (95% CI: 0.064–0.198, p = 0.01) at 12, 36, and 60 months, respectively (Table 3). These results indicated that the RSF model exhibited superior predictive capability than the Cox model.
NRIs and IDIs of RSF model compared with Cox model in HN patients
. | IDI (95% CI) . | p value . | NRI (95% CI) . | p value . |
---|---|---|---|---|
12 months | 0.213 (0.088–0.331) | <0.01 | 0.867 (0.547–0.939) | <0.01 |
36 months | 0.177 (0.112–0.245) | <0.01 | 0.478 (0.307–0.656) | <0.01 |
60 months | 0.127 (0.064–0.198) | 0.01 | 0.489 (0.230–0.678) | 0.004 |
. | IDI (95% CI) . | p value . | NRI (95% CI) . | p value . |
---|---|---|---|---|
12 months | 0.213 (0.088–0.331) | <0.01 | 0.867 (0.547–0.939) | <0.01 |
36 months | 0.177 (0.112–0.245) | <0.01 | 0.478 (0.307–0.656) | <0.01 |
60 months | 0.127 (0.064–0.198) | 0.01 | 0.489 (0.230–0.678) | 0.004 |
NRI, net reclassification improvement; IDI, integrated discrimination improvement; RSF, random survival forest.
Decision Curve Analysis
The DCA offers a graphical representation to visualize the practical applicability of each model. As shown in Figure 4, the RSF model exhibits a higher net benefit than the Cox model across all threshold probabilities, including 12, 36, and 60 months thresholds. This suggests that the clinical benefit of the RSF model may surpass that of the Cox model.
Comparison of DCA plots between the RSF model and Cox model. a DCA plots at 12 months. b DCA plots at 36 months. c DCA plots at 60 months. DCA, decision curve analysis; RSF, random survival forest; Cox, Cox regression model.
Comparison of DCA plots between the RSF model and Cox model. a DCA plots at 12 months. b DCA plots at 36 months. c DCA plots at 60 months. DCA, decision curve analysis; RSF, random survival forest; Cox, Cox regression model.
Renal Outcomes of HN Patients under Different Pathological Scores
According to the results of variable screening and model performance, pathological scores were the most important risk factors for identifying renal prognosis in HN patients. Patients were divided into four groups according to grade classification. The patients in the higher renal pathological score group had significantly worse survival than those in the low-score group (log-rank p < 0.001), which predicted more severe renal outcomes (Fig. 5).
Survival curve of HN patients under different pathological score. Grade, overall chronicity grade of renal pathology.
Survival curve of HN patients under different pathological score. Grade, overall chronicity grade of renal pathology.
Discussion
In this study, we assess the adverse outcomes in patients with HN who underwent renal biopsy over a nearly 10-year period and developed a prognosis prediction model based on RSF using clinical and pathological data to provide new evidence-based medical evidence for HN prognosis. Among the 112 variables, we identified six key risk factors that might influence the prognosis of HN patients and used them to construct the prognostic model. To our knowledge, this study is the first to evaluate the long-term prognosis of HN using the RSF algorithm. To enhance the accuracy of HN prognosis prediction, we incorporated histological chronicity scores into the clinical data. Our findings indicate that integrating histological chronicity scores with clinical data enhances the prediction of HN prognosis. Compared with the Cox model, the RSF model exhibited better predictive power in discrimination and calibration, enhancing the accessibility of information for clinicians and patients.
During the median follow-up period, the primary endpoint population was predominantly characterized by an eGFR decline ≥50%, and the inclusion of patients who underwent renal biopsy suggests a relatively more severe disease state, potentially contributing to a higher number of endpoint occurrences compared to those typically observed in epidemiological studies. Given the increasing incidence of HN and the specific nature of its diagnosis, risk prediction and stratification remain a major challenge for treatment decisions and clinical study design [22]. However, there is still a lack of prediction models for the overall prognosis of HN. The RSF algorithm, a variant of random forest, mitigates overfitting through two random sampling processes, without the constraints of assumptions such as proportional hazards and log-linear relationships [23]. In contrast, the Cox model only identifies linear relationships [24]. Our findings indicate that compared to the Cox model, the RSF model exhibits strong robustness in predicting HN patient development and provides more precise probability estimates over time. Regarding predictive accuracy, the RSF model demonstrates higher C-index and AUC values. Meanwhile, the RSF model had lower IBS and higher NRI and IDI values, indicating better calibration ability and stronger discrimination ability. The superior performance of RSF over Cox may be attributed to the potential nonlinear relationships between variables and outcomes.
Current evidence suggests that the controversy surrounding the risk factors for the prognosis of HN patients remains unresolved [25]. A renal biopsy followed by histological analysis plays a pivotal role in elucidating the etiology, guiding treatment decisions, and predicting the prognosis of renal diseases, including HN [26, 27]. However, the research evaluating the predictive value of histopathological lesions in risk assessment remains sparse and predominantly focused on patients with primary glomerular disease [19, 27]. Difficulties arise in comparing study results due to the absence of a unified grading standard for the pathology of HN. Morphometry is one way to measure chronic changes by renal biopsy, and it can provide accurate and reproducible indicators of chronic changes, such as the percentage of IF and TA, which are more predictive of progressive CKD than scores based on visual examination alone [28]. Liang et al. [10] found that renal prognosis in a cohort of 194 HN patients was independently associated with renal pathological changes by multivariate Cox analysis, which is consistent with our results. The overall chronicity grade is a semiquantitative assessment that includes glomerular sclerosis, TA, IF, and arteriosclerosis [19]. Therefore, it can more comprehensively evaluate renal changes and is especially suitable for patients with HN. In our models, the overall chronicity grade is the highest contribution factor, and it can be observed that the higher classification of patients with worse prognosis is in the Kaplan-Meier curve. Therefore, further strengthening renal biopsy will not only help reduce the misdiagnosis rate further but also help evaluate the prognosis of patients with HN.
eGFR, as the basis for early understanding of renal dysfunction and the staging of CKD, is undoubtedly one of the most critical risk factors for renal prognosis [29]. In addition to eGFR, SV was also a key clinical factor in predicting the prognosis of HN patients in our study. Reduced SV is often associated with poor cardiovascular outcomes. It is negatively correlated with the ratio of left ventricular (LV) mass to end-diastolic volume, and its reduction serves as an early indicator of myocardial dysfunction in hypertensive patients [30]. Furthermore, in hypertensive patients with LV hypertrophy, a higher pulse pressure-SV index is associated with increased cardiovascular morbidity and mortality [31]. Therefore, we propose that SV is not only linked to cardiovascular events and mortality in patients with hypertension but also may be closely associated with the renal prognosis of HN.
It has long been established that HDL-C is inversely associated with cardiovascular risk, primarily based on influential epidemiological studies demonstrating that every 1 mg/dL increase in HDL-C is associated with a 2–3% reduction in cardiovascular mortality risk [32]. Our findings support this conclusion, indicating that higher HDL-C levels may have a renoprotective effect in patients with hypertensive renal damage. However, it remains to be determined whether this relationship is linear or exhibits a U-shaped association [33]. Therefore, further exploration is warranted to clarify the relationship between HDL-C and the prognosis of HN. Besides, the association between HCT and the prognosis of hypertension has been the subject of numerous investigations, and the literature contains conflicting reports regarding its role as a risk marker for cardiovascular morbidity and mortality [34, 35]. However, we believe that the decrease in HCT in our study may be more directly related to renal anemia, indicating worsening renal function.
Our RSF model utilizes clinical and pathological factors predominantly for outcome prediction, demonstrating broader background data and more targeted results compared to previous studies. However, there are also some limitations to this study. First, the relatively small sample size and the absence of external validation are the main constraints. To address these limitations, we plan to conduct a larger prospective study enrolling a broader cohort of eligible patients to validate our model further. Additionally, as the patients included in this study were from China, further verification is required to assess the generalizability of our prediction model to other ethnic groups. Finally, we only collected basic data on treatment strategies, and during the modeling phase, we did not observe any significant impact of therapeutic factors on prognosis, nor were we able to assess the influence of treatment during the follow-up period. Although treatment strategies are not a prognostic factor in our model, additional investigations examining the impact of treatment may be warranted.
Conclusions
This study is the first to introduce the RSF model for prognosticating the risk of HN patients and to demonstrate that six variables (grade, eGFR, HDL-C, HCT, monocyte, and SV) can accurately predict disease progression. Compared to the Cox model, the RSF model exhibits superior survival prediction and provides more precise prognostic stratification. Furthermore, renal biopsy can be strengthened to reduce the misdiagnosis rate and evaluate patient prognosis. These findings may enhance clinical decision-making in medical practice, and well-designed studies are warranted for further verification.
Acknowledgments
The authors would like to thank all statisticians for participating in this study.
Statement of Ethics
This study was approved by the Ethics Committee of Xijing Hospital and conducted following the Helsinki Declaration (ethical number: KY20213027-1). Due to the retrospective design, the Ethics Committee waived the requirement for informed consent from eligible patients.
Conflict of Interest Statement
The authors report that there are no competing interests to declare.
Funding Sources
This study was supported by National Natural Science Foundation of China grants (reference number: 82170722, 82270715), Key Research and Development Plan of Shaanxi Province grant (reference number: No.2023-ZDLSF-15), and Xijing Hospital medical staff training facilitation program (reference number: XJZT24LY12).
Author Contributions
Conceptualization: Y.Q. and S.S.; methodology: J.Z. and Y.X.; software: Y.Q. and Y.X.; validation: Y.Q. and X.N.; data collection: J.Z., Y.W., A.W., and P.L.; writing – original draft: Y.Q., J.Z., P.L., and W.Z.; writing – review and editing: Y.X., X.N., Z.Y., Y.H., M.H., W.Z., M.L., and S.S. Each of the authors contributed an important role in drafting the manuscript, and completeness of the overall work was thoroughly investigated and appropriately addressed.
Additional Information
Yunlong Qin, Jin Zhao, and Yan Xing contributed equally to this work.
Data Availability Statement
All data generated or analyzed during this study are included in this article and its online supplementary material files. Further inquiries can be directed to the corresponding author.