Multi-Class Classification Model for Faulty Steel Plates Using Python Take 2

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: The purpose of this project is to construct a predictive model using various machine learning algorithms and to document the end-to-end steps using a template. The Faulty Steel Plates dataset is a multi-class classification situation where we are trying to predict one of several (more than two) possible outcomes.

INTRODUCTION: This dataset comes from research by Semeion, Research Center of Sciences of Communication. The original aim of the research was to correctly classify the type of surface defects in stainless steel plates, with six types of possible defects (plus “other”). The Input vector was made up of 27 indicators that approximately the geometric shape of the defect and its outline. According to the research paper, Semeion was commissioned by the Centro Sviluppo Materiali (Italy) for this task and therefore it is not possible to provide details on the nature of the 27 indicators used as Input vectors or the types of the 6 classes of defects.

ANALYSIS: The baseline performance of the machine learning algorithms achieved an average accuracy of 73.99%. Two algorithms (Bagged Decision Trees and Gradient Boosting) achieved the top accuracy metrics after the first round of modeling. After tuning the hyperparameters, Gradient Boosting turned in the top overall result and achieved an accuracy metric of 79.31%. By using the tuned hyperparameters, the Gradient Boosting algorithm processed the testing dataset with an accuracy of 78.80%, which was consistent with the prediction from the training data.

CONCLUSION: For this iteration, the Gradient Boosting algorithm achieved the best overall results using the training and test datasets. For this dataset, Gradient Boosting should be considered for further modeling.

Dataset Used: Steel Plates Faults Dataset

Dataset ML Model: Multi-class classification with numerical attributes

Dataset Reference:

One potential source of performance benchmarks:

The HTML formatted report can be found here on GitHub.