Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.
Dataset Used: Faulty Steel Plates
Dataset ML Model: Multi-Class classification with numerical attributes
Dataset Reference: http://archive.ics.uci.edu/ml/datasets/steel+plates+faults
One potential source of performance benchmarks: https://www.kaggle.com/uciml/faulty-steel-plates
INTRODUCTION: This dataset comes from research by Semeion, Research Center of Sciences of Communication. The original aim of the research was to correctly classify the type of surface defects in stainless steel plates, with six types of possible defects (plus “other”). The Input vector was made up of 27 indicators that approximately the geometric shape of the defect and its outline. According to the research paper, Semeion was commissioned by the Centro Sviluppo Materiali (Italy) for this task and therefore it is not possible to provide details on the nature of the 27 indicators used as Input vectors or the types of the 6 classes of defects.
CONCLUSION: The baseline performance of the 10 algorithms achieved an average accuracy of 60.92%. Three algorithms (Bagged Decision Trees, Extra Trees, and Stochastic Gradient Boosting) achieved the top three accuracy scores after the first round of modeling. After a series of tuning trials, the top result achieved using the training data was from Stochastic Gradient Boosting. It achieved an average accuracy of 78.05%. Using the optimized tuning parameter available, the Stochastic Gradient Boosting processed the validation dataset with an accuracy of 80.10%, which was slightly better than with the training data alone. For this project, the Stochastic Gradient Boosting ensemble algorithm yielded consistently top-notch training and validation results, which warrant the additional processing required by the algorithm.
The HTML formatted report can be found here on GitHub.