Multi-Class Classification Model for Faulty Steel Plates Using Python Take 3

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: The purpose of this project is to construct a predictive model using various machine learning algorithms and to document the end-to-end steps using a template. The Faulty Steel Plates dataset is a multi-class classification situation where we are trying to predict one of several (more than two) possible outcomes.

INTRODUCTION: This dataset comes from research by Semeion, Research Center of Sciences of Communication. The original aim of the research was to correctly classify the type of surface defects in stainless steel plates, with six types of possible defects (plus “other”). The Input vector was made up of 27 indicators that approximately the geometric shape of the defect and its outline. According to the research paper, Semeion was commissioned by the Centro Sviluppo Materiali (Italy) for this task, and therefore it is not possible to provide details on the nature of the 27 indicators used as Input vectors or the types of the six classes of defects.

For this iteration, we will leverage TPOT, the automated machine learning tool got Python, that optimizes machine learning pipelines using genetic programming.

ANALYSIS: The baseline performance of the machine learning algorithms achieved the best accuracy of 77.18% after generation one. After generation 20, Random Forest turned in the top overall result and achieved an accuracy metric of 79.46%. Furthermore, the Random Forest algorithm processed the testing dataset with an accuracy of 81.48%, which was even better than the prediction result from the training data.

CONCLUSION: For this iteration, the Random Forest algorithm achieved the best overall results using the training and test datasets. For this dataset, Random Forest should be considered for further modeling.

Dataset Used: Steel Plates Faults Dataset

Dataset ML Model: Multi-class classification with numerical attributes

Dataset Reference:

One potential source of performance benchmarks:

The HTML formatted report can be found here on GitHub.