Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.
Dataset Used: Bike Sharing Dataset
Dataset ML Model: Regression with numerical attributes
Dataset Reference: https://archive.ics.uci.edu/ml/datasets/Bike+Sharing+Dataset
For performance benchmarks, please consult: https://www.kaggle.com/contactprad/bike-share-daily-data
INTRODUCTION: Using the data generated by a bike sharing system, this project attempts to predict the daily demand for bike sharing. For this iteration (Take No.2) of the project, we attempt to use the data available, apply the one-hot-encoding transformation on each categorical attribute, and apply the Stochastic Gradient Boosting algorithm to examine the modeling effectiveness. Again, the goal of this iteration is to examine various data transformation options and find a sufficiently accurate (low error) combination for future prediction tasks.
This iteration of the project will test the following six modeling scenarios:
- Scenario No.1: Perform one-hot-encoding on the categorical variable “mnth” and observe the change in regression accuracy.
- Scenario No.2: Perform one-hot-encoding on the categorical variable “holiday” and observe the change in regression accuracy.
- Scenario No.3: Perform one-hot-encoding on the categorical variable “weekday” and observe the change in regression accuracy.
- Scenario No.4: Perform one-hot-encoding on the categorical variable “workingday” and observe the change in regression accuracy.
- Scenario No.5: Perform one-hot-encoding on the categorical variable “weathersit” and observe the change in regression accuracy.
- Scenario No.6: Perform one-hot-encoding on the all categorical variables and observe the change in regression accuracy.
For all scenarios, steps from sections No.3 and No.4 will be repeated for each scenario.
CONCLUSION: The baseline performance of the Stochastic Gradient Boosting stands at an RMSE value of 1255 using the training data. The various scenarios achieved an average RMSE value of between 1243 and 1261. For this iteration of the project, the one-hot-encoding transformation apparently did not improve the model performance with noticeable differences.
The HTML-formatted report can be found here on GitHub.