Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery (http://machinelearningmastery.com/)
Dataset Used: Bank Marketing Data Set
Data Set ML Model: Binary classification with numerical and categorical attributes
Dataset Reference: http://archive.ics.uci.edu/ml/datasets/bank+marketing
One source of potential performance benchmarks: https://www.kaggle.com/rouseguy/bankbalanced
INTRODUCTION: The Bank Marketing dataset involves predicting the whether the bank clients will subscribe (yes/no) a term deposit (target variable). It is a binary (2-class) classification problem. There are over 45,000 observations with 16 input variables and 1 output variable. There are no missing values in the dataset.
CONCLUSION: The baseline performance of eight algorithms achieved an average accuracy of 89.99%. Three algorithms (Random Forest, Stochastic Gradient Boosting, and Bagged CART) achieved the top accuracy and Kappa scores. The top result achieved using the training data was from Random Forest. It achieved an average accuracy of 90.65% after a series of tuning trials, and its accuracy in processing the validation dataset was 90.91%. For this project, the Random Forest ensemble algorithms yielded consistently top-notch training and validation results, which warrant the additional processing required by the algorithm.
The HTML formatted report can be found here on GitHub.