Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.
SUMMARY: The purpose of this project is to construct a predictive model using various machine learning algorithms and to document the end-to-end steps using a template. The Glass Identification dataset is a multi-class classification situation where we are trying to predict one of several (more than two) possible outcomes.
INTRODUCTION: The dataset involves predicting, from USA Forensic Science Service, six types of glass; defined in terms of their oxide content (i.e. Na, Fe, K, etc). The study of classification of types of glass was also partly motivated by criminological investigation. At the scene of the crime, the glass left can be used as evidence…if it is correctly identified!
ANALYSIS: The baseline performance of the machine learning algorithms achieved an average accuracy of 70.35%. Two algorithms (Bagged Decision Trees and Random Forest) achieved the top accuracy metrics after the first round of modeling. After a series of tuning trials, Bagged Decision Trees turned in the top overall result and achieved an accuracy metric of 76.87%. By using the optimized parameters, the Bagged Decision Trees algorithm processed the testing dataset with an accuracy of 77.78%, which was even better than the prediction from the training data.
CONCLUSION: For this iteration, the Bagged Decision Trees algorithm achieved the best overall results using the training and test datasets. For this dataset, Bagged Decision Trees should be considered for further modeling.
Dataset Used: Glass Identification Data Set
Dataset ML Model: Multi-Class classification with numerical attributes
Dataset Reference: https://archive.ics.uci.edu/ml/datasets/glass+identification
One source of potential performance benchmarks: https://www.kaggle.com/uciml/glass
The HTML formatted report can be found here on GitHub.