Regression Model for Metro Interstate Traffic Volume Using R Take 1

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: The purpose of this project is to construct a predictive model using various machine learning algorithms and to document the end-to-end steps using a template. The Metro Interstate Traffic Volume dataset is a regression situation where we are trying to predict the value of a continuous variable.

INTRODUCTION: This dataset captured the hourly measurement of Interstate 94 Westbound traffic volume for MN DoT ATR station 301. The station is roughly midway between Minneapolis and St Paul, MN. The dataset also included the hourly weather and holiday attributes for assessing their impacts on traffic volume.

In this iteration, we will establish the baseline mean squared error for comparison with future rounds of modeling. This round of modeling will not include the date-time and weather description attributes.

ANALYSIS: The baseline performance of the machine learning algorithms achieved an average RMSE of 2099. Two algorithms (Random Forest and Gradient Boosting) achieved the top RMSE metrics after the first round of modeling. After a series of tuning trials, Gradient Boosting turned in the top overall result and achieved an RMSE metric of 1895. After applying the optimized parameters, the Gradient Boosting algorithm processed the testing dataset with an RMSE of 1899, which was slightly better than the prediction from the training data.

CONCLUSION: For this iteration, the Gradient Boosting algorithm achieved the best overall training and validation results. For this dataset, the Random Forest algorithm could be considered for further modeling.

Dataset Used: Metro Interstate Traffic Volume Data Set

Dataset ML Model: Regression with numerical and categorical attributes

Dataset Reference:

One potential source of performance benchmarks:

The HTML formatted report can be found here on GitHub.