Classification Models Analysis for Stroke Prediction

Dheiver Francisco Santos

doi:10.1590/SciELOPreprints.7182

##article.authors##

Dheiver Francisco Santos CATI - Advanced Center for Intelligent Technologies https://orcid.org/0000-0002-8599-9436

DOI:

https://doi.org/10.1590/SciELOPreprints.7182

Keywords:

machine learning, Stroke prediction, classification models, data preprocessing, Random Forest, Support Vector Machine, healthcare, feature importance analysis, classification metrics, confusion matrices, public health

Abstract

This study explores the application of machine learning in the prediction of stroke occurrences, a critical task in healthcare with the potential to save lives and reduce the impact of this life-altering medical event. Leveraging the "Healthcare Stroke Data" dataset, we employed two powerful classification models, the Random Forest and Support Vector Machine (SVM), to forecast stroke likelihood. Our analysis encompasses data preprocessing, model training, and comprehensive evaluation using classification metrics and confusion matrices. The study reveals the trade-offs between accuracy, recall, precision, and the F1 score in both models. While the Random Forest exhibits higher accuracy, the SVM excels in recall, a crucial factor in healthcare. Precision challenges in both models highlight the need for further refinement. Additionally, we conducted a feature importance analysis, emphasizing the pivotal role of age, BMI, and glucose levels in stroke prediction. This work exemplifies the potential of machine learning in healthcare and contributes to ongoing efforts in improving stroke prediction and prevention.