An Independent Validation of the Kidney Failure Risk Equation (KFRE) on a Seoul University Hospital Population
Background
KFRE Overview:
Developed by Tangri et al. to predict the risk of progression to kidney failure in CKD patients (stages 3 to 5).
Key predictors identified: age, sex, eGFR, and urine albumin-to-creatinine ratio (ACR).
Model Details:
KFRE estimates the risk of kidney failure at 2 and 5 years using statistical modeling.
The equation has been validated across various populations and is widely used for clinical decision-making and patient counseling.
Methods
Python Library Implementation
Created to replicate the original KFRE equations.
- The library supports calculations for the 2-year and 5-year risk using:
4-variable equation: age, sex, eGFR, uACR.
6-variable equation: Adds diabetes mellitus and hypertension.
8-variable equation: Includes all from the 4-variable model plus serum albumin, serum phosphorous, serum bicarbonate, and calcium.
Application
The kfre library enables healthcare professionals and researchers to integrate KFRE calculations into their analyses and decision-making processes.
Further Information
A detailed usage guide is available at the official documentation: kfre Documentation.
Preprocessing
Create Randomized Patient IDs for Indexing - crucial for several reasons:
Privacy and Anonymity: Random patient IDs help protect patient privacy and maintain anonymity. This is particularly important in healthcare research to ensure compliance with data protection regulations such as HIPAA.
Data Integrity: Randomized IDs prevent potential biases that could arise from using identifiable information. This ensures that the analysis is based solely on clinical data without any influence from patient identity.
Simplified Data Management: Random IDs facilitate easier data management and tracking, especially when dealing
Scalability: Randomized IDs allow for easier scaling of datasets, as new patients can be added without concern for ID conflicts.
By using random patient IDs, we can enhance the robustness, security, and scalability of our data management processes.
Age | SEX | HTN | DM | GFR | uACR | ACR | Ca | P | |
---|---|---|---|---|---|---|---|---|---|
Patient_ID | |||||||||
867721094 | 61 | 1 | 1 | 0 | 9.148234 | 10 | 10 | 6 | 4 |
533512602 | 30 | 0 | 1 | 1 | 153.9749 | 29 | 29 | 9.4 | 4.5 |
988350865 | 55 | 1 | 0 | 0 | 66.81041 | 25.77426 | 9.2 | 1.6 | |
428707535 | 35 | 1 | 0 | 0 | 89.34706 | 28.4513 | 10.5 | 2.9 | |
813646552 | 66 | 1 | 1 | 0 | 7.316171 | 0 | 0 | 7.7 | 6.5 |
Tangri et al. used years to determine outcome for 2-years, and 5-years, respectively, so we will have to convert days to years.
The class_esrd_outcome()
function below creates a new column called years that converts ESRD_dur
.
Now we use the years column to classify the ESRD column into two new columns inside df called 2_year_outcome
and 5_year_outcome
given the constrains above.
ESRD_in_2_year_outcome | ESRD_in_5_year_outcome |
---|---|
0 | 0 |
0 | 0 |
0 | 0 |
0 | 0 |
1 | 1 |
Patient_ID | Age | HTN | DM | GFR | ACR | Ca | P | Alb | TCO2 | ESRD | ESRD_dur | Sex | kfre_4var_2year | kfre_4var_5year | kfre_6var_2year | kfre_6var_5year | kfre_8var_2year | kfre_8var_5year |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
829432911 | 61 | 1 | 0 | 9.15 | 10 | 6 | 4 | 2.6 | 16 | 0 | 2086 | Male | 0.1218 | 0.3953 | 0.1319 | 0.4155 | 0.5811 | 0.9800 |
451074312 | 66 | 1 | 0 | 7.32 | 0 | 7.7 | 6.5 | 3.5 | 14 | 1 | 3 | Male | 0.0001 | 0.0004 | 0.0001 | 0.0004 | 0.0046 | 0.0207 |
472425367 | 70 | 1 | 0 | 10.12 | 0 | 7.5 | 3.8 | 3.2 | 17 | 1 | 93 | Male | 0.0001 | 0.0003 | 0.0001 | 0.0003 | 0.0015 | 0.0067 |
300680837 | 49 | 0 | 0 | 7.63 | 0 | 8.5 | 5.4 | 4.3 | 23 | 1 | 138 | Female | 0.0001 | 0.0004 | 0.0001 | 0.0004 | 0.0013 | 0.0059 |
105959696 | 54 | 1 | 1 | 11.34 | 0 | 8 | 5.1 | 2.9 | 20 | 1 | 311 | Male | 0.0001 | 0.0003 | 0.0001 | 0.0003 | 0.0020 | 0.0091 |
205521453 | 56 | 0 | 1 | 34.9 | 0 | 8.6 | 3 | 3.2 | 13 | 1 | 461 | Male | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0001 | 0.0007 |
964175840 | 62 | 1 | 1 | 43.62 | 0 | 9.2 | 3.8 | 4.3 | 26 | 1 | 1312 | Male | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0001 |
366215045 | 73 | 1 | 0 | 19.63 | 0 | 7.7 | 2.9 | 2.4 | 14 | 1 | 1566 | Male | 0.0000 | 0.0001 | 0.0000 | 0.0001 | 0.0007 | 0.0031 |
703995795 | 28 | 1 | 0 | 47.68 | 0 | 9.8 | 3.7 | 4.8 | 28 | 1 | 1587 | Male | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0001 |
193572795 | 25 | 1 | 0 | 7.2 | 0 | 8.4 | 5.3 | 4.2 | 19 | 1 | 1705 | Male | 0.0002 | 0.0010 | 0.0002 | 0.0009 | 0.0036 | 0.0162 |
140269431 | 64 | 1 | 1 | 53.15 | 0 | 9.3 | 4.3 | 3.7 | 25 | 1 | 1958 | Female | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 |
167087450 | 69 | 1 | 1 | 58.17 | 0 | 9.7 | 3.4 | 4.7 | 25 | 1 | 2856 | Male | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 |
548454181 | 23 | 1 | 0 | 28.81 | 0 | 7 | 6.8 | 1.7 | 20 | 1 | 3224 | Male | 0.0000 | 0.0001 | 0.0000 | 0.0001 | 0.0020 | 0.0089 |
780327933 | 67 | 0 | 1 | 29.83 | 0 | 7.9 | 2.2 | 2.8 | 14 | 0 | 28 | Female | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0002 | 0.0008 |
586810087 | 67 | 1 | 0 | 47.63 | 0 | 9.4 | 3.7 | 4 | 30 | 0 | 114 | Female | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 |
128191268 | 71 | 1 | 0 | 47.07 | 0 | 9.3 | 4.1 | 4.8 | 22 | 0 | 210 | Female | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 |
354037890 | 77 | 1 | 0 | 44.77 | 0 | 8.4 | 1.6 | 2.4 | 18 | 0 | 322 | Male | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0001 |
325811054 | 67 | 0 | 0 | 58.52 | 0 | 10.1 | 2.9 | 4.2 | 26 | 0 | 418 | Male | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 |
290397253 | 71 | 1 | 0 | 42.44 | 0 | 9.1 | 3.1 | 4.2 | 31 | 0 | 439 | Male | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 |
340488662 | 71 | 1 | 1 | 49.03 | 0 | 9.7 | 2.9 | 4.5 | 29 | 0 | 756 | Male | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 |
852670690 | 71 | 1 | 1 | 57.84 | 0 | 8.9 | 1.6 | 4.2 | 28 | 0 | 1215 | Male | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 |
532777291 | 83 | 0 | 1 | 41.12 | 0 | 8.4 | 2.7 | 3.7 | 22 | 0 | 1267 | Male | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0001 |
770017136 | 73 | 1 | 0 | 48.76 | 0 | 9.5 | 4.9 | 3.8 | 21 | 0 | 1323 | Male | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0001 |
845855347 | 28 | 1 | 0 | 31.17 | 0 | 10.3 | 4.7 | 2.5 | 27 | 0 | 1370 | Female | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0002 | 0.0007 |
259011659 | 63 | 0 | 0 | 59.52 | 0 | 9.6 | 4.2 | 4.4 | 26 | 0 | 1620 | Female | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 |
451920450 | 71 | 1 | 1 | 27.19 | 0 | 8.9 | 3.5 | 3.9 | 23 | 0 | 1680 | Male | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0001 | 0.0004 |
971884645 | 65 | 1 | 0 | 22.52 | 0 | 9.1 | 4.1 | 4 | 25 | 0 | 1911 | Male | 0.0000 | 0.0001 | 0.0000 | 0.0001 | 0.0002 | 0.0007 |
340172975 | 66 | 0 | 1 | 58.96 | 0 | 9.2 | 3.1 | 3.7 | 14 | 0 | 2013 | Female | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 |
Results
We define truth for 2-year and 5-year outcomes
Filter down the dataset to CKD stages 3-5 (Tangri et al.)
Extract the true labels for the 2-year and 5-year outcomes from the DataFrame df.
Assign the true labels for the 2-year outcome to
y_true_2_yr
and for the 5-year outcome toy_true_5_yr
.Combine these true labels into a list y_true.
Define Predictions for 4-Variable and 6-Variable KFREs for 2-Years and 5-Years
Extract the predicted probabilities for the 4-variable KFRE model for both the 2-year and 5-year outcomes.
Assign these predictions to
y_pred_4var_2_yr
andy_pred_4var_5_yr
, respectively.Similarly, extract and assign the predicted probabilities for the 6-variable KFRE model for both the 2-year and 5-year outcomes to
y_pred_6var_2_yr
andy_pred_6var_5_yr
.Combine the 4-variable model predictions into a list
preds_4var
.
Metrics | 2_year_4_var_kfre | 5_year_4_var_kfre | 2_year_6_var_kfre | 5_year_6_var_kfre | 2_year_8_var_kfre | 5_year_8_var_kfre |
---|---|---|---|---|---|---|
Precision/PPV | 0.614641 | 0.590909 | 0.616874 | 0.593812 | 0.589202 | 0.558601 |
Average Precision | 0.559144 | 0.602071 | 0.559212 | 0.603624 | 0.548023 | 0.587199 |
Sensitivity | 0.445892 | 0.641297 | 0.446894 | 0.635659 | 0.503006 | 0.675123 |
Specificity | 0.949919 | 0.877670 | 0.950278 | 0.880194 | 0.937175 | 0.853010 |
AUC ROC | 0.875196 | 0.844687 | 0.875311 | 0.845210 | 0.877136 | 0.843695 |
Brier Score | 0.091249 | 0.137478 | 0.091271 | 0.136473 | 0.096593 | 0.148403 |
Appendix
Descriptive Statistics
Age | GFR | ACR | Ca | P | Alb | TCO2 | ESRD_dur | |
---|---|---|---|---|---|---|---|---|
count | 16619 | 16619 | 16619 | 16619 | 16619 | 16619 | 16619 | 16619 |
mean | 54.43 | 66.7 | 727.43 | 9.14 | 3.62 | 4.04 | 26.13 | 1385.88 |
std | 17.24 | 31.13 | 1760.78 | 0.67 | 0.84 | 0.61 | 3.99 | 1444.12 |
min | 18 | 2.237831 | 0 | 3.5 | 0 | 0.3 | 3 | 1 |
25% | 43 | 46.94 | 25.28 | 8.8 | 3.1 | 3.8 | 24 | 241 |
50% | 57 | 67.84 | 120.86 | 9.2 | 3.5 | 4.2 | 27 | 868 |
75% | 68 | 85.13 | 642 | 9.6 | 4 | 4.4 | 29 | 2091.5 |
max | 97 | 415.04 | 60323.44 | 14.8 | 15.5 | 5.7 | 59 | 5892 |
References
Kang, M. W. (2024). [KFRE validation dataset, Asian cohort]. Unpublished dataset provided by personal communication, June 26, 2024. Department of Internal Medicine, Seoul National University College of Medicine, Seoul, Korea.
Kang, M. W., Tangri, N., Kim, Y. C., An, J. N., Lee, J., Li, L., Oh, Y. K., Kim, D. K., Joo, K. W., Kim, Y. S., Lim, C. S., & Lee, J. P. (2020). An independent validation of the kidney failure risk equation in an Asian population. Scientific Reports, 10, 12920. https://doi.org/10.1038/s41598-020-69715-3