Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.
SUMMARY: The purpose of this project is to construct a predictive model using various machine learning algorithms and to document the end-to-end steps using a template. The BNP Paribas Cardif Claims Management dataset is a binary classification situation where we are trying to predict one of the two possible outcomes.
INTRODUCTION: As a global specialist in personal insurance, BNP Paribas Cardif sponsored a Kaggle competition to help them identify the categories of claims. In a world shaped by the emergence of new practices and behaviors generated by the digital economy, BNP Paribas Cardif would like to streamline its claims management practice. In this Kaggle challenge, the company challenged the participants to predict the category of a claim based on features available early in the process. Better predictions can help BNP Paribas Cardif accelerate its claims process and therefore provide a better service to its customers.
In the previous Scikit-Learn iterations, we constructed and tuned machine learning models for this dataset using the Scikit-Learn and the XGBoost libraries. We also observed the best accuracy result that we could obtain using the tuned models with the training, validation, and test datasets.
In this Take1 iteration, we will construct and tune machine learning models for this dataset using TensorFlow with three layers. We will observe the best accuracy result that we can obtain using the tuned models with the validation and test datasets. Furthermore, we will apply the MLP model to Kaggle’s test dataset and submit a list of predictions to Kaggle for evaluation.
ANALYSIS: From the previous Scikit-Learn iterations, the optimized XGBoost model processed the testing dataset with a log loss metric of 0.4634.
From this Take1 iteration, the performance of the three-layer TensorFlow model achieved a log loss score of 0.4720 with the training dataset. After a series of tuning trials, the TensorFlow model processed the validation dataset with a log loss score of 0.4726, which was consistent with the prediction from the training result. When configured with the optimized parameters, the TensorFlow model processed the test dataset with a log loss score of 0.4736, which was consistent with the training/tuning phase.
CONCLUSION: For this dataset, the model built using TensorFlow with three layers achieved a satisfactory result. We should consider using TensorFlow to model this dataset further.
Dataset Used: BNP Paribas Cardif Claims Management Data Set
Dataset ML Model: Binary classification with numerical and categorical attributes
Dataset Reference: https://www.kaggle.com/c/bnp-paribas-cardif-claims-management/overview
One potential source of performance benchmark: https://www.kaggle.com/c/bnp-paribas-cardif-claims-management/leaderboard
The HTML formatted report can be found here on GitHub.