Binary Classification Deep Learning Model for BNP Paribas Cardif Claims Management Using TensorFlow Take 3

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: The purpose of this project is to construct a predictive model using various machine learning algorithms and to document the end-to-end steps using a template. The BNP Paribas Cardif Claims Management dataset is a binary classification situation where we are trying to predict one of the two possible outcomes.

INTRODUCTION: As a global specialist in personal insurance, BNP Paribas Cardif sponsored a Kaggle competition to help them identify the categories of claims. In a world shaped by the emergence of new practices and behaviors generated by the digital economy, BNP Paribas Cardif would like to streamline its claims management practice. In this Kaggle challenge, the company challenged the participants to predict the category of a claim based on features available early in the process. Better predictions can help BNP Paribas Cardif accelerate its claims process and therefore provide a better service to its customers.

In the previous Scikit-Learn iterations, we constructed and tuned machine learning models for this dataset using the Scikit-Learn and the XGBoost libraries. We also observed the best accuracy result that we could obtain using the tuned models with the training, validation, and test datasets.

In iteration Take1, we constructed and tuned machine learning models for this dataset using TensorFlow with three layers. We also observed the best result that we could obtain using the tuned models with the validation and test datasets. Furthermore, we applied the MLP model to Kaggle’s test dataset and submitted a list of predictions to Kaggle for evaluation.

In iteration Take2, we constructed and tuned machine learning models for this dataset using TensorFlow with four layers. We also observed the best result that we could obtain using the tuned models with the validation and test datasets. Furthermore, we applied the MLP model to Kaggle’s test dataset and submitted a list of predictions to Kaggle for evaluation.

In this Take3 iteration, we will construct and tune machine learning models for this dataset using TensorFlow with five layers. We will observe the best result that we can obtain using the tuned models with the validation and test datasets. Furthermore, we will apply the MLP model to Kaggle’s test dataset and submit a list of predictions to Kaggle for evaluation.

ANALYSIS: From the previous Scikit-Learn iterations, the optimized XGBoost model processed the testing dataset with a log loss metric of 0.4634.

From iteration Take1, the performance of the three-layer TensorFlow model achieved a log loss score of 0.4720 with the training dataset. After a series of tuning trials, the TensorFlow model processed the validation dataset with a log loss score of 0.4726, which was consistent with the prediction from the training result. When configured with the optimized parameters, the TensorFlow model processed the test dataset with a log loss score of 0.4736, which was consistent with the training/tuning phase.

From iteration Take2, the performance of the four-layer TensorFlow model achieved a log loss score of 0.4765 with the training dataset. After a series of tuning trials, the TensorFlow model processed the validation dataset with a log loss score of 0.4735, which was consistent with the prediction from the training result. When configured with the optimized parameters, the TensorFlow model processed the test dataset with a log loss score of 0.4763, which was consistent with the training/tuning phase but worse than the three-layer model.

From this Take3 iteration, the performance of the five-layer TensorFlow model achieved a log loss score of 0.4809 with the training dataset. After a series of tuning trials, the TensorFlow model processed the validation dataset with a log loss score of 0.4740, which was consistent with the prediction from the training result. When configured with the optimized parameters, the TensorFlow model processed the test dataset with a log loss score of 0.4715, which was consistent with the training/tuning phase and slightly better than the three-layer model.

CONCLUSION: For this dataset, the model built using TensorFlow with five layers achieved a satisfactory result. We should consider using TensorFlow to model this dataset further.

Dataset Used: BNP Paribas Cardif Claims Management Data Set

Dataset ML Model: Binary classification with numerical and categorical attributes

Dataset Reference: https://www.kaggle.com/c/bnp-paribas-cardif-claims-management/overview

One potential source of performance benchmark: https://www.kaggle.com/c/bnp-paribas-cardif-claims-management/leaderboard

The HTML formatted report can be found here on GitHub.