Binary Classification Model for Bondora P2P Lending Using Python and Scikit-Learn

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: This project aims to construct a predictive model using various machine learning algorithms and document the end-to-end steps using a template. The Bondora P2P Lending dataset is a binary classification situation where we attempt to predict one of the two possible outcomes.

INTRODUCTION: The Kaggle dataset owner retrieved this dataset from Bondora, a leading European peer-to-peer lending platform. The data comprises demographic and financial information of the borrowers with defaulted and non-defaulted loans between February 2009 and July 2021. For investors, “peer-to-peer lending” or “P2P” offers an attractive way to diversify portfolios and enhance long-term performance. However, to make effective decisions, investors want to minimize the risk of default of each lending decision and realize the return that compensates for the risk. Therefore, we will predict the default risk by focusing on the “DefaultDate” attribute as the target.

ANALYSIS: The average performance of the machine learning algorithms achieved a ROC-AUC benchmark of 0.9539 using the training dataset. We selected Random Forest and Extra Trees to perform the tuning exercises. After a series of tuning trials, the refined Extra Trees model processed the training dataset with a final ROC-AUC score of 0.9801. When we processed the test dataset with the final model, the model achieved a ROC-AUC score of 0.9162.

CONCLUSION: In this iteration, the Random Forest model appeared to be a suitable algorithm for modeling this dataset.

Dataset Used: Kaggle Bondora P2P Lending Loan Data

Dataset ML Model: Binary classification with numerical and categorical attributes

Dataset Reference:

Dataset Attribute Description:

One potential source of performance benchmark:

The HTML formatted report can be found here on GitHub.