Simple Classification Model for Bank Marketing Using R

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery (http://machinelearningmastery.com/)

Dataset Used: Bank Marketing Data Set

Data Set ML Model: Binary classification with numerical and categorical attributes

Dataset Reference: http://archive.ics.uci.edu/ml/datasets/bank+marketing

One source of potential performance benchmarks: https://www.kaggle.com/rouseguy/bankbalanced

INTRODUCTION: The Bank Marketing dataset involves predicting the whether the bank clients will subscribe (yes/no) a term deposit (target variable). It is a binary (2-class) classification problem. There are over 45,000 observations with 16 input variables and 1 output variable. There are no missing values in the dataset.

CONCLUSION: The baseline performance of eight algorithms achieved an average accuracy of 89.99%. Three algorithms (Random Forest, Stochastic Gradient Boosting, and Bagged CART) achieved the top accuracy and Kappa scores. The top result achieved using the training data was from Random Forest. It achieved an average accuracy of 90.65% after a series of tuning trials, and its accuracy in processing the validation dataset was 90.91%. For this project, the Random Forest ensemble algorithms yielded consistently top-notch training and validation results, which warrant the additional processing required by the algorithm.

The HTML formatted report can be found here on GitHub.