Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.
SUMMARY: The purpose of this project is to construct a predictive model using various machine learning algorithms and to document the end-to-end steps using a template. The Allstate Claims Severity dataset is a regression situation where we are trying to predict the value of a continuous variable.
INTRODUCTION: Allstate is interested in developing automated methods of predicting the cost, and hence severity, of claims. In this Kaggle challenge, the contestants were asked to create an algorithm that could accurately predict claims severity. Each row in this dataset represents an insurance claim. The task is to predict the value for the ‘loss’ column. Variables prefaced with ‘cat’ are categorical, while those prefaced with ‘cont’ are continuous.
In iteration Take1, we constructed machine learning models using the original dataset and with minimum data preparation and no feature engineering. The XGBoost model serves as the baseline for the future iterations of modeling.
In iteration Take2, we tuned additional parameters of the XGBoost model and improved the MAE metric further.
In iteration Take3, we constructed several basic Multilayer Perceptron (MLP) models with one hidden layer. The basic MLP model serves as the baseline model as we build more complex MLP models in future iterations.
In iteration Take4, we constructed several Multilayer Perceptron (MLP) models with two hidden layers. We also observed whether the additional hidden layer has a positive effect on MAE when compared to models that have just one hidden layer.
In iteration Take5, we constructed several Multilayer Perceptron (MLP) models with three hidden layers. We also observed whether the additional hidden layer has a positive effect on MAE when compared to models that have just one or two hidden layers.
In iteration Take6, we constructed several three-layer Multilayer Perceptron (MLP) models with batch normalization. We also observed whether the batch normalization technique has a positive effect on MAE when compared to models without the batch normalization.
In this iteration, we will tune the MLP model that has 512/128/64 nodes and 0.25/0.25/0.25 Dropout ratios. We will perform a grid search for the most optimized model using different learning rates, kernel initializers, and batch sizes.
ANALYSIS: In iteration Take1, the baseline performance of the machine learning algorithms achieved an average MAE of 1301. eXtreme Gradient Boosting (XGBoost) achieved the top MAE metric after the first round of modeling. After a series of tuning trials, XGBoost achieved an MAE metric of 1199. By using the optimized parameters, the XGBoost algorithm processed the test dataset with an MAE of 1204, which was in line with the MAE prediction from the training data.
In iteration Take2, the further-tuned eXtreme Gradient Boosting (XGBoost) model achieved an improved MAE metric of 1191 using the training data. By using the same optimized parameters, the XGBoost algorithm processed the test dataset with an MAE of 1195, which was in line with the MAE prediction from the training data.
In iteration Take3, the simple MLP model with 128 nodes achieved an MAE metric of 1193 on the test dataset after 50 epochs. The MLP model with 1024 nodes processed the same test dataset with an MAE of 1170 after the same number of epochs but with a much larger over-fitting.
In iteration Take4, the MLP model with 128/64 nodes and 0.25/0.25 Dropout ratios achieved an MAE metric of 1169 on the test dataset after 31 epochs. The MLP model with 256/128 nodes and 0.25/0.50 Dropout ratios also processed the same test dataset with an MAE of 1169 after 25 epochs.
In iteration Take5, the MLP model with 512/128/64 nodes and 0.25/0.50/0.50 Dropout ratios achieved an MAE metric of 1164 on the test dataset after 16 epochs. The MLP model with 1024/512/256 nodes and 0.25/0.50/0.50 Dropout ratios also processed the same test dataset with an MAE of 1164 after nine epochs.
In iteration Take6, the MLP model with 512/128/64 nodes and 0.25/0.25/0.25 Dropout ratios achieved an MAE metric of 1157 on the test dataset after 22 epochs. The MLP model with 1024/512/256 nodes and 0.50/0.50/0.50 Dropout ratios also processed the same test dataset with an MAE of 1159 after 48 epochs.
In this Take7 iteration, the models with the learning rate of 0.0005 seemed to produce the most stable training and testing loss curves. Those models also achieved the MAEs between 1158-1161 for the testing dataset around 20 epochs before they started to overfit.
CONCLUSION: For this iteration, the 512/128/64 nodes and 0.25/0.25/0.25 Dropout MLP model achieved good overall results using the learning rate of 0.0005. For this dataset, we should consider using this model for further modeling activities or production uses.
Dataset Used: Allstate Claims Severity Data Set
Dataset ML Model: Regression with numerical and categorical attributes
Dataset Reference: https://www.kaggle.com/c/allstate-claims-severity/data
One potential source of performance benchmarks: https://www.kaggle.com/c/allstate-claims-severity/leaderboard
The HTML formatted report can be found here on GitHub.