Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.
SUMMARY: The project aims to construct a predictive model using various machine learning algorithms and document the end-to-end steps using a template. The Fetal Health Classification dataset is a binary-class modeling situation where we attempt to predict one of two possible outcomes.
INTRODUCTION: This dataset comes from the CDC’s Behavioral Risk Factor Surveillance System (BRFSS) study, which conducts annual telephone surveys to gather data on the health status of U.S. residents. The original dataset consists of 401,958 rows and 279 columns. However, the Kaggle project owner selected some of the most relevant attributes from the dataset and cleaned it up for machine learning projects.
ANALYSIS: The performance of the preliminary XGBoost model achieved a ROC-AUC benchmark of 92.68%. After a series of tuning trials, the final model processed the training dataset with a ROC-AUC score of 92.81%. When we processed the test dataset with the final model, the model achieved a ROC-AUC score of 72.25%.
CONCLUSION: In this iteration, the XGBoost model appeared to be a suitable algorithm for modeling this dataset.
Dataset Used: Personal Key Indicators of Heart Disease Dataset
Dataset ML Model: Binary classification with numerical and categorical features
Dataset Reference: https://www.kaggle.com/kamilpytlak/personal-key-indicators-of-heart-disease
One source of potential performance benchmarks: https://www.kaggle.com/kamilpytlak/personal-key-indicators-of-heart-disease/code
The HTML formatted report can be found here on GitHub.