Regression Model for Bike Sharing Using Python – Take 1

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

Dataset Used: Bike Sharing Dataset

Dataset ML Model: Regression with numerical attributes

Dataset Reference:

For available performance benchmarks, please consult:

INTRODUCTION: Using the data generated by a bike sharing system, this project attempts to predict the daily demand for bike sharing. For this iteration of the project, we attempt to use the data available for discovering a suitable machine learning algorithm that future predictions can use. We have kept the data transformation activities to a minimum and drop the several attributes that do not make sense to keep or simply will not help in training the model. Again, the goal of this iteration is to find a sufficiently accurate (best Root Mean Squared Error or RMSE) algorithm for the future prediction tasks.

CONCLUSION: The baseline performance of predicting the target variable achieved an average RMSE value of 1,483. Three algorithms (AdaBoost, Random Forest, and Stochastic Gradient Boosting) achieved the better NMSE values during the initial modeling round. After a series of tuning trials with these three algorithms, Stochastic Gradient Boosting produced the lowest RMSE value of 1,233 using the training data.

Stochastic Gradient Boosting also processed the validation dataset with an RMSE value of 1,293, which was slightly worse than the best training result. For this project, the Stochastic Gradient Boosting ensemble algorithm yielded consistently top-notch training and validation results, which warrant the additional processing required by the algorithm.

The HTML formatted report can be found here on GitHub.