Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.
SUMMARY: The purpose of this project is to construct a predictive model using various machine learning algorithms and to document the end-to-end steps using a template. The Faulty Steel Plates dataset is a multi-class classification situation where we attempt to predict one of several (more than two) possible outcomes.
INTRODUCTION: This dataset comes from research by Semeion, Research Center of Sciences of Communication. The original aim of the study was to correctly classify the type of surface defects in stainless steel plates, with six kinds of possible defects (plus “other”). The Input vector was made up of 27 indicators that approximately the geometric shape of the fault and its outline.
From another previous modeling iteration, the performance of a three-layer TensorFlow model achieved an average accuracy score of 72.51%. After tuning the hyperparameters, the best model processed the training dataset with an accuracy of 74.77%. Furthermore, the final three-layer model processed the test dataset with an accuracy of 74.28%.
ANALYSIS: After a series of modeling trials, the AutoKeras system processed the validation dataset with a maximum accuracy score of 78.64%. When we applied the best AutoKeras model to the previously unseen test dataset, we obtained an accuracy score of 76.60%.
CONCLUSION: In this iteration, the best TensorFlow model generated by AutoKeras appeared to be suitable for modeling this dataset. We should consider experimenting with AutoKeras for further modeling.
Dataset Used: Faulty Steel Plates Data Set
Dataset ML Model: Binary classification with numerical attributes
Dataset Reference: http://archive.ics.uci.edu/ml/datasets/steel+plates+faults
One potential source of performance benchmark: https://www.kaggle.com/uciml/faulty-steel-plates
The HTML formatted report can be found here on GitHub.