Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.
For more information on this case study project, please consult Dr. Brownlee’s blog post at https://machinelearningmastery.com/standard-machine-learning-datasets/.
Dataset Used: Abalone Data Set
Data Set ML Model: Regression with Categorical, Integer, Real attributes
Dataset Reference: http://archive.ics.uci.edu/ml/datasets/Abalone
The Abalone Dataset involves predicting the age of abalone given objective measures of individuals. Although it was presented as a multi-class classification problem, this exercise will frame it using regression. The baseline performance of predicting the mean value is an RMSE of approximately 3.2 rings.
CONCLUSION: The baseline performance of predicting the most prevalent class achieved an RMSE of approximately 2.28 rings. The top RMSE result achieved via SVM was 2.13 rings after a series of tuning. The ensemble algorithm did not yield a better result than SVM to justify the additional processing and tuning necessary.
The purpose of this project is to analyze a dataset using various machine learning algorithms and to document the steps using a template. The project aims to touch on the following areas:
- Document a regression predictive modeling problem end-to-end.
- Explore data transformation options for improving model performance
- Explore algorithm tuning techniques for improving model performance
- Explore using and tuning ensemble methods for improving model performance
The HTML formatted report can be found here on GitHub.