Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.
Code Credit: Adapted from a blog post made available by Dr. Jason Brownlee of Machine Learning Mastery.
PREFACE: This is a replication of Python code from Dr. Brownlee’s blog post on time series. I have combined all the code snippets into one script so that I can turn the whole process into a template. The comments and analysis were also part of the blog post and annotated here to explain each coding block.
SUMMARY: The purpose of this project is to construct a time series prediction model and document the end-to-end steps using a template. The Annual Water Usage in Baltimore dataset is a time series situation where we are trying to forecast future outcomes based on the past data points.
INTRODUCTION: The problem is to predict annual water usage. The dataset provides the annual water usage in Baltimore from 1885 to 1963, or 79 years of data. The dataset contains 79 observations in the units of liters per capita per day and is credited to Hipel and McLeod, 1994.
ANALYSIS: The baseline prediction (or persistence) for the dataset resulted in an RMSE of 21.975. The manually configured model was simplified to ARIMA(4,1,1) and produced an RMSE of 31.097, which was higher than the persistent model. After applying the grid search technique to the dataset, the final RMSE of the ARIMA(2,1,0) model was 21.733. This is only a slightly smaller error than the persistent model, and it may or may not be statistically different.
CONCLUSION: The final RMSE for the validation period is predicted at 16 liters per capita per day. This is not too different from the expected error of 21, but we would expect that it is also not too different from a simple persistence model. The forecast does have the characteristics of a persistence forecast. This suggests that although this time series does have an obvious trend, it is still a reasonably difficult problem.
Dataset Used: Annual Water Usage in Baltimore
Dataset ML Model: Time series forecast with numerical attributes
Dataset Reference: https://datamarket.com/data/set/22sl/baltmore-city-annual-water-use-liters-per-capita-per-day-1885-1968#!ds=22sl&display=line
One potential source of performance benchmark: https://machinelearningmastery.com/time-series-forecast-study-python-annual-water-usage-baltimore/
The HTML formatted report can be found here on GitHub.