Predicting Heart Disease Through Supervised Machine Learning Algorithms




Garduno Rapp, Estefanie

Journal Title

Journal ISSN

Volume Title


Content Notes


INTRODUCTION: Heart disease may present in a variety of forms including rhythm-disturbances, pump-failure, silent ischemia, angina, and sudden death among others. Early diagnosis is a crucial step to decrease serious cardiac events. Machine Learning (ML) is a promising tool to improve healthcare diagnostics and risk prediction in highly relevant and common illnesses such as cardiovascular disease. OBJECTIVE: To develop and evaluate three effective machine learning-supervised models to diagnose heart disease based on individual features. METHODS: We developed three machine learning models (Elastic net, logistic regression, and random forest) to identify individuals with heart disease. The discovery dataset used for model development included 303 subjects (138 with heart disease and 165 controls) and 14 predictor variables (including traditional cardiovascular risk factors). The outcome variable was the diagnosis of heart disease. The discovery dataset was split into training (70%), validation (10%), and testing (20%) subsets. Model development for elastic net and random forest was accomplished using the training and validation splits, whereas logistic regression was fit using only the training split. We selected hyperparameters for the elastic net model through cross validation and selected the predictors for logistic regression by backward stepwise selection. We calculated predictions using the testing split and evaluated the performance of the classifier based on the area under the receiver-operating-characteristic curve (AUC). Lastly, we used an external validation dataset (n=295, 107 cases and 188 controls) to make predictions. RESULTS: In the testing dataset, the elastic net model achieved AUC of 90% and accuracy of 86%; the logistic regression AUC was 95% and accuracy of 90%. For the random forest model, the Out-of-Box error was 25.21%; the number of variables used at each split were 3 and the accuracy in the testing test was 83%. When the model was confronted with an external validation dataset, the accuracy was 77%. CONCLUSION: We developed three models to evaluate ML performance with a discrete dataset. The logistic regression model outperformed the other models with an accuracy of 90% and an AUC of 95%. The final model included 6 variables: Sex, heart rate, exercise induced ST depression, and typical and atypical anginal pain and non-anginal pain. Future work on boosting techniques is required to improve the accuracy of the predictive model. Additionally, developing a comparison analysis between these ML models and conventional clinical approaches may help elucidate the net benefit.

General Notes

Lightning talk presentation at Texas Health Informatics Alliance (THIA) 2022.
The 2022 Texas Health Informatics Alliance Conference was held at the University of Texas at Arlington on September 9, 2022.

Table of Contents



Garduno Rapp, E. (2022, September 9). Predicting heart disease through supervised machine learning algorithms [Conference presentation]. Texas Health Informatics Alliance (THIA) 2022, Arlington, Texas.

Related URI