An Independent Validation of the Kidney Failure Risk Equation (KFRE) on a Seoul University Hospital Population

Background

KFRE Overview:

  • Developed by Tangri et al. to predict the risk of progression to kidney failure in CKD patients (stages 3 to 5).

  • Key predictors identified: age, sex, eGFR, and urine albumin-to-creatinine ratio (ACR).

Model Details:

  • KFRE estimates the risk of kidney failure at 2 and 5 years using statistical modeling.

  • The equation has been validated across various populations and is widely used for clinical decision-making and patient counseling.

Methods

Python Library Implementation

  • Created to replicate the original KFRE equations.

  • The library supports calculations for the 2-year and 5-year risk using:
    • 4-variable equation: age, sex, eGFR, uACR.

    • 6-variable equation: Adds diabetes mellitus and hypertension.

    • 8-variable equation: Includes all from the 4-variable model plus serum albumin, serum phosphorous, serum bicarbonate, and calcium.

Application

  • The kfre library enables healthcare professionals and researchers to integrate KFRE calculations into their analyses and decision-making processes.

Further Information

  • A detailed usage guide is available at the official documentation: kfre Documentation.

Preprocessing

Create Randomized Patient IDs for Indexing - crucial for several reasons:

  1. Privacy and Anonymity: Random patient IDs help protect patient privacy and maintain anonymity. This is particularly important in healthcare research to ensure compliance with data protection regulations such as HIPAA.

  2. Data Integrity: Randomized IDs prevent potential biases that could arise from using identifiable information. This ensures that the analysis is based solely on clinical data without any influence from patient identity.

  3. Simplified Data Management: Random IDs facilitate easier data management and tracking, especially when dealing

  4. Scalability: Randomized IDs allow for easier scaling of datasets, as new patients can be added without concern for ID conflicts.

By using random patient IDs, we can enhance the robustness, security, and scalability of our data management processes.

Patient Table (Manual Column Widths)
AgeSEXHTNDMGFR uACRACRCaP
Patient_ID
867721094611109.148234101064
53351260230011153.974929299.44.5
9883508655510066.81041 25.774269.21.6
4287075353510089.34706 28.451310.52.9
813646552661107.316171007.76.5

Tangri et al. used years to determine outcome for 2-years, and 5-years, respectively, so we will have to convert days to years.

The class_esrd_outcome() function below creates a new column called years that converts ESRD_dur.

Now we use the years column to classify the ESRD column into two new columns inside df called 2_year_outcome and 5_year_outcome given the constrains above.

ESRD_in_2_year_outcome ESRD_in_5_year_outcome
0 0
0 0
0 0
0 0
1 1

Patient_ID Age HTN DM GFR ACR Ca P Alb TCO2 ESRD ESRD_dur Sex kfre_4var_2year kfre_4var_5year kfre_6var_2year kfre_6var_5year kfre_8var_2year kfre_8var_5year
829432911 61 1 0 9.15 10 6 4 2.6 16 0 2086 Male 0.1218 0.3953 0.1319 0.4155 0.5811 0.9800
451074312 66 1 0 7.32 0 7.7 6.5 3.5 14 1 3 Male 0.0001 0.0004 0.0001 0.0004 0.0046 0.0207
472425367 70 1 0 10.12 0 7.5 3.8 3.2 17 1 93 Male 0.0001 0.0003 0.0001 0.0003 0.0015 0.0067
300680837 49 0 0 7.63 0 8.5 5.4 4.3 23 1 138 Female 0.0001 0.0004 0.0001 0.0004 0.0013 0.0059
105959696 54 1 1 11.34 0 8 5.1 2.9 20 1 311 Male 0.0001 0.0003 0.0001 0.0003 0.0020 0.0091
205521453 56 0 1 34.9 0 8.6 3 3.2 13 1 461 Male 0.0000 0.0000 0.0000 0.0000 0.0001 0.0007
964175840 62 1 1 43.62 0 9.2 3.8 4.3 26 1 1312 Male 0.0000 0.0000 0.0000 0.0000 0.0000 0.0001
366215045 73 1 0 19.63 0 7.7 2.9 2.4 14 1 1566 Male 0.0000 0.0001 0.0000 0.0001 0.0007 0.0031
703995795 28 1 0 47.68 0 9.8 3.7 4.8 28 1 1587 Male 0.0000 0.0000 0.0000 0.0000 0.0000 0.0001
193572795 25 1 0 7.2 0 8.4 5.3 4.2 19 1 1705 Male 0.0002 0.0010 0.0002 0.0009 0.0036 0.0162
140269431 64 1 1 53.15 0 9.3 4.3 3.7 25 1 1958 Female 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
167087450 69 1 1 58.17 0 9.7 3.4 4.7 25 1 2856 Male 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
548454181 23 1 0 28.81 0 7 6.8 1.7 20 1 3224 Male 0.0000 0.0001 0.0000 0.0001 0.0020 0.0089
780327933 67 0 1 29.83 0 7.9 2.2 2.8 14 0 28 Female 0.0000 0.0000 0.0000 0.0000 0.0002 0.0008
586810087 67 1 0 47.63 0 9.4 3.7 4 30 0 114 Female 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
128191268 71 1 0 47.07 0 9.3 4.1 4.8 22 0 210 Female 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
354037890 77 1 0 44.77 0 8.4 1.6 2.4 18 0 322 Male 0.0000 0.0000 0.0000 0.0000 0.0000 0.0001
325811054 67 0 0 58.52 0 10.1 2.9 4.2 26 0 418 Male 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
290397253 71 1 0 42.44 0 9.1 3.1 4.2 31 0 439 Male 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
340488662 71 1 1 49.03 0 9.7 2.9 4.5 29 0 756 Male 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
852670690 71 1 1 57.84 0 8.9 1.6 4.2 28 0 1215 Male 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
532777291 83 0 1 41.12 0 8.4 2.7 3.7 22 0 1267 Male 0.0000 0.0000 0.0000 0.0000 0.0000 0.0001
770017136 73 1 0 48.76 0 9.5 4.9 3.8 21 0 1323 Male 0.0000 0.0000 0.0000 0.0000 0.0000 0.0001
845855347 28 1 0 31.17 0 10.3 4.7 2.5 27 0 1370 Female 0.0000 0.0000 0.0000 0.0000 0.0002 0.0007
259011659 63 0 0 59.52 0 9.6 4.2 4.4 26 0 1620 Female 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
451920450 71 1 1 27.19 0 8.9 3.5 3.9 23 0 1680 Male 0.0000 0.0000 0.0000 0.0000 0.0001 0.0004
971884645 65 1 0 22.52 0 9.1 4.1 4 25 0 1911 Male 0.0000 0.0001 0.0000 0.0001 0.0002 0.0007
340172975 66 0 1 58.96 0 9.2 3.1 3.7 14 0 2013 Female 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000

Results

  • We define truth for 2-year and 5-year outcomes

  • Filter down the dataset to CKD stages 3-5 (Tangri et al.)

  • Extract the true labels for the 2-year and 5-year outcomes from the DataFrame df.

  • Assign the true labels for the 2-year outcome to y_true_2_yr and for the 5-year outcome to y_true_5_yr.

  • Combine these true labels into a list y_true.

Define Predictions for 4-Variable and 6-Variable KFREs for 2-Years and 5-Years

  • Extract the predicted probabilities for the 4-variable KFRE model for both the 2-year and 5-year outcomes.

  • Assign these predictions to y_pred_4var_2_yr and y_pred_4var_5_yr, respectively.

  • Similarly, extract and assign the predicted probabilities for the 6-variable KFRE model for both the 2-year and 5-year outcomes to y_pred_6var_2_yr and y_pred_6var_5_yr.

  • Combine the 4-variable model predictions into a list preds_4var.

These steps set up the necessary true labels and predictions for subsequent performance evaluation and analysis of the 4-variable and 6-variable KFRE models for both 2-year and 5-year outcomes.

Performance Grid Performance Grid Performance Grid

Table below displays KFRE metrics across models.

Metrics 2_year_4_var_kfre 5_year_4_var_kfre 2_year_6_var_kfre 5_year_6_var_kfre 2_year_8_var_kfre 5_year_8_var_kfre
Precision/PPV0.6146410.5909090.6168740.5938120.5892020.558601
Average Precision0.5591440.6020710.5592120.6036240.5480230.587199
Sensitivity0.4458920.6412970.4468940.6356590.5030060.675123
Specificity0.9499190.8776700.9502780.8801940.9371750.853010
AUC ROC0.8751960.8446870.8753110.8452100.8771360.843695
Brier Score0.0912490.1374780.0912710.1364730.0965930.148403

Appendix

Descriptive Statistics

Age GFR ACR Ca P Alb TCO2 ESRD_dur
count 16619 16619 16619 16619 16619 16619 16619 16619
mean 54.43 66.7 727.43 9.14 3.62 4.04 26.13 1385.88
std 17.24 31.13 1760.78 0.67 0.84 0.61 3.99 1444.12
min 18 2.237831 0 3.5 0 0.3 3 1
25% 43 46.94 25.28 8.8 3.1 3.8 24 241
50% 57 67.84 120.86 9.2 3.5 4.2 27 868
75% 68 85.13 642 9.6 4 4.4 29 2091.5
max 97 415.04 60323.44 14.8 15.5 5.7 59 5892

KDE Density Distributions

References

  1. Kang, M. W. (2024). [KFRE validation dataset, Asian cohort]. Unpublished dataset provided by personal communication, June 26, 2024. Department of Internal Medicine, Seoul National University College of Medicine, Seoul, Korea.

  2. Kang, M. W., Tangri, N., Kim, Y. C., An, J. N., Lee, J., Li, L., Oh, Y. K., Kim, D. K., Joo, K. W., Kim, Y. S., Lim, C. S., & Lee, J. P. (2020). An independent validation of the kidney failure risk equation in an Asian population. Scientific Reports, 10, 12920. https://doi.org/10.1038/s41598-020-69715-3