E-CatBoost: An efficient machine learning framework for predicting ICU mortality using the eICU Collaborative Research Database

Safaei, Nima
Safaei, Babak
Seyedekrami, Seyedhouman
Talafidaryani, Mojtaba
Masoud, Arezoo
Wang, Shaodong
Li, Qing
Moqri, Mahdi
Major Professor
Committee Member
Journal Title
Journal ISSN
Volume Title
Li, Qing
Research Projects
Organizational Units
Journal Issue
Industrial and Manufacturing Systems Engineering
Improving the Intensive Care Unit (ICU) management network and building cost-effective and well-managed healthcare systems are high priorities for healthcare units. Creating accurate and explainable mortality prediction models helps identify the most critical risk factors in the patients’ survival/death status and early detect the most in-need patients. This study proposes a highly accurate and efficient machine learning model for predicting ICU mortality status upon discharge using the information available during the first 24 hours of admission. The most important features in mortality prediction are identified, and the effects of changing each feature on the prediction are studied. We used supervised machine learning models and illness severity scoring systems to benchmark the mortality prediction. We also implemented a combination of SHAP, LIME, partial dependence, and individual conditional expectation plots to explain the predictions made by the best-performing model (CatBoost). We proposed E-CatBoost, an optimized and efficient patient mortality prediction model, which can accurately predict the patients’ discharge status using only ten input features. We used eICU-CRD v2.0 to train and validate the models; the dataset contains information on over 200,000 ICU admissions. The patients were divided into twelve disease groups, and models were fitted and tuned for each group. The models’ predictive performance was evaluated using the area under a receiver operating curve (AUROC). The AUROC scores were 0.86 [std:0.02] to 0.92 [std:0.02] for CatBoost and 0.83 [std:0.02] to 0.91 [std:0.03] for E-CatBoost models across the defined disease groups; if measured over the entire patient population, their AUROC scores were 7 to 18 and 2 to 12 percent higher than the baseline models, respectively. Based on SHAP explanations, we found age, heart rate, respiratory rate, blood urine nitrogen, and creatinine level as the most critical cross-disease features in mortality predictions.
This article is published as Safaei, Nima, Babak Safaei, Seyedhouman Seyedekrami, Mojtaba Talafidaryani, Arezoo Masoud, Shaodong Wang, Qing Li, and Mahdi Moqri. "E-CatBoost: An efficient machine learning framework for predicting ICU mortality using the eICU Collaborative Research Database." PLoS ONE 17, no. 5 (2022): e0262895. DOI: 10.1371/journal.pone.0262895. Copyright 2022 Safaei et al. Attribution 4.0 International (CC BY 4.0). Posted with permission.