Binary Classification Model for Springleaf Marketing Response Using Python Take 1

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: The purpose of this project is to construct a predictive model using various machine learning algorithms and to document the end-to-end steps using a template. The Springleaf Marketing Response dataset is a binary classification situation where we are trying to predict one of the two possible outcomes.

INTRODUCTION: Springleaf leverages the direct mail method for connecting with customers who may need a loan. To improve their targeting efforts, Springleaf must be sure they are focusing on the customers who are likely to respond and be good candidates for their services. Using a dataset with a broad set of anonymized features, Springleaf is looking to predict which customers will respond to a direct mail offer.

In this Take1 iteration, we will construct several traditional machine learning models using the linear, non-linear, and ensemble techniques. We will observe the best ROC-AUC result that we can obtain with each of these models.

ANALYSIS: The baseline performance of the machine learning algorithms achieved an average ROC-AUC of 70.42%. The Random Forest and Gradient Boosting Machine algorithms made the top ROC-AUC metrics after the first round of modeling. After a series of tuning trials, GBM turned in an overall ROC-AUC result of 77.96%. When we apply the tuned GBM algorithm to the test dataset, we obtained a ROC-AUC score of 62.58%, which was much lower than the score from model training.

CONCLUSION: For this iteration, the GBM algorithm achieved a ROC-AUC result with high variance using the training and test datasets. For this dataset, we should consider doing more modeling with the GBM and other algorithms.

Dataset Used: Springleaf Marketing Response Data Set

Dataset ML Model: Binary classification with numerical and categorical attributes

Dataset Reference: https://www.kaggle.com/c/springleaf-marketing-response/data

One potential source of performance benchmark: https://www.kaggle.com/c/springleaf-marketing-response/leaderboard

The HTML formatted report can be found here on GitHub.