Simple Regression Model for Predicting Abalone Age Using R

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

For more information on this case study project, please consult Dr. Brownlee’s blog post at

Dataset Used: Abalone Data Set

Data Set ML Model: Regression with Categorical, Integer, Real attributes

Dataset Reference:

The Abalone Dataset involves predicting the age of abalone given objective measures of individuals. Although it was presented as a multi-class classification problem, this exercise will frame it using regression. The baseline performance of predicting the mean value is an RMSE of approximately 3.2 rings.

CONCLUSION: The baseline performance of predicting the most prevalent class achieved an RMSE of approximately 2.28 rings. The top RMSE result achieved via SVM was 2.13 rings after a series of tuning. The ensemble algorithm did not yield a better result than SVM to justify the additional processing and tuning necessary.

The purpose of this project is to analyze a dataset using various machine learning algorithms and to document the steps using a template. The project aims to touch on the following areas:

  • Document a regression predictive modeling problem end-to-end.
  • Explore data transformation options for improving model performance
  • Explore algorithm tuning techniques for improving model performance
  • Explore using and tuning ensemble methods for improving model performance

The HTML formatted report can be found here on GitHub.