Binary-Class Classification Model for German Credit Risks Using Python Take 1

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: The purpose of this project is to construct a prediction model using various machine learning algorithms and to document the end-to-end steps using a template. The German Credit Risks Dataset is a binary-class classification situation where we are trying to predict one of the two possible outcomes.

INTRODUCTION: This dataset contains 1,000 entries with 20 categorial/symbolic attributes prepared by Prof. Hofmann. In this dataset, each entry represents a person who takes on credit risk by a German bank. Each person is classified as good or bad credit risks according to the set of attributes.

For this iteration, the script focuses on evaluating various machine learning algorithms and identify the algorithm that produces the best accuracy metric.

CONCLUSION: The baseline performance of the eight algorithms achieved an average accuracy of 71.80%. Three algorithms (Logistic Regression, Extra Trees, and Stochastic Gradient Boosting) achieved the top three accuracy scores after the first round of modeling. After a series of tuning trials, Stochastic Gradient Boosting turned in the top result using the training data. It achieved an average accuracy of 76.14%. Using the optimized tuning parameter available, the Stochastic Gradient Boosting algorithm processed the validation dataset with an accuracy of 77.66%, which was slightly better than the accuracy from the training data.

From the model-building activities, the Stochastic Gradient Boosting ensemble algorithm yielded the top-notch training and validation results. It is the recommended algorithm to use from the accuracy perspective.

Dataset Used: German Credit Data Set

Dataset ML Model: Binary classification with numerical and categorical attributes

Dataset Reference:

One potential source of performance benchmarks:

The HTML formatted report can be found here on GitHub.