Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.
SUMMARY: The project aims to construct a predictive model using various machine learning algorithms and document the end-to-end steps using a template. The Fetal Health Classification dataset is a binary-class modeling situation where we attempt to predict one of two possible outcomes.
INTRODUCTION: This dataset comes from the CDC’s Behavioral Risk Factor Surveillance System (BRFSS) study, which conducts annual telephone surveys to gather data on the health status of U.S. residents. The original dataset consists of 401,958 rows and 279 columns. However, the Kaggle project owner selected some of the most relevant attributes from the dataset and cleaned it up for machine learning projects.
ANALYSIS: The average performance of the machine learning algorithms achieved a ROC-AUC benchmark of 86.24% using the training dataset. Furthermore, we selected Random Forest as the final model as it processed the training dataset with a final ROC-AUC score of 91.28%. When we processed the test dataset with the final model, the model achieved a ROC-AUC score of 70.94%.
CONCLUSION: In this iteration, the Random Forest model appeared to be a suitable algorithm for modeling this dataset.
Dataset Used: Personal Key Indicators of Heart Disease Dataset
Dataset ML Model: Binary classification with numerical and categorical features
Dataset Reference: https://www.kaggle.com/kamilpytlak/personal-key-indicators-of-heart-disease
One source of potential performance benchmarks: https://www.kaggle.com/kamilpytlak/personal-key-indicators-of-heart-disease/code
The HTML formatted report can be found here on GitHub.