Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.
Code Credit: Adapted from a blog post made available by Dr. Jason Brownlee of Machine Learning Mastery.
PREFACE: This is a replication of Python code from Dr. Brownlee’s blog post on time series. I have combined all the code snippets into one script so that I could turn the whole process into a template. The comments and analysis were also part of the blog post and annotated here to explain each coding block.
SUMMARY: The purpose of this project is to construct a time series prediction model and document the end-to-end steps using a template. The Monthly Armed Robberies in Boston dataset is a time series situation where we are trying to forecast future outcomes based on the past data points.
INTRODUCTION: The problem is to predict the number of monthly armed robberies in Boston, USA. The dataset provides the number of monthly armed robberies in Boston from January 1966 to October 1975, Monthly Boston armed robberies Jan.1966-Oct.1975 Deutsch and Alt (1977) and credited to McCleary & Hay (1980).
ANALYSIS: The baseline prediction (or persistence) for the dataset resulted in an RMSE of 51.8. The manually configured model was simplified to ARIMA(0,1,2) and produced an RMSE of 49.8, which was slightly better than the persistent model. After applying the power transformation (Box-Cox) to the dataset, the final RMSE of the model on the transformed data was 49.4. This is only a slightly smaller error than the ARIMA model on untransformed data, and it may or may not be statistically different.
CONCLUSION: The final RMSE for the validation period was predicted at 53 robberies, and it was not too different from the expected error of 49. Although the forecast appears to have the characteristic of a persistence forecast, this does suggest that, while this time series does have an obvious trend, it is still a reasonably difficult problem.
Dataset Used: Monthly Armed Robberies in Boston
Dataset ML Model: Time series forecast with numerical attributes
Dataset Reference: https://datamarket.com/data/set/22ob/monthly-boston-armed-robberies-jan1966-oct1975-deutsch-and-alt-1977#!ds=22ob&display=line
One potential source of performance benchmark: https://machinelearningmastery.com/time-series-forecast-case-study-python-monthly-armed-robberies-boston/
The HTML formatted report can be found here on GitHub.