Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.
SUMMARY: The purpose of this project is to construct a predictive model using various machine learning algorithms and to document the end-to-end steps using a template. The Credit Card Default dataset is a binary classification situation where we are trying to predict one of the two possible outcomes.
INTRODUCTION: This dataset contains information on default payments, demographic factors, credit data, history of payment, and bill statements of credit card clients in Taiwan from April 2005 to September 2005.
ANALYSIS: The baseline performance of the machine learning algorithms achieved an average accuracy of 79.62%. Two algorithms (Extra Trees and eXtreme Gradient Boosting) achieved the top accuracy metrics after the first round of modeling. After a series of tuning trials, eXtreme Gradient Boosting turned in the top overall result and achieved an accuracy metric of 82.30%. By using the optimized parameters, the eXtreme Gradient Boosting algorithm processed the test dataset with an accuracy of 81.80%, which was consistent with the result of model training.
CONCLUSION: For this iteration, the eXtreme Gradient Boosting algorithm achieved the best overall results using the training and test datasets. For this dataset, eXtreme Gradient Boosting should be considered for further modeling.
Dataset Used: Default of Credit Card Clients Data Set
Dataset ML Model: Binary classification with numerical and categorical attributes
Dataset Reference: https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients
The HTML formatted report can be found here on GitHub.