Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.
Code Credit: Adapted from a blog post made available by Dr. Jason Brownlee of Machine Learning Mastery.
PREFACE: This is a replication of Python code from Dr. Brownlee’s blog post on time series. I have combined all the code snippets into one script so that I can turn the whole process into a template. The comments and analysis were also part of the blog post and annotated here to explain each coding block.
SUMMARY: The purpose of this project is to construct a time series prediction model and document the end-to-end steps using a template. The Monthly Sales of French Champagne dataset is a time series situation where we are trying to forecast future outcomes based on the past data points.
INTRODUCTION: The problem is to predict the number of monthly sales of champagne for the Perrin Freres label (named for a region in France). The dataset provides the number of monthly sales of champagne from January 1964 to September 1972, or just under 10 years of data. The values are a count of millions of sales and there are 105 observations. The dataset is credited to Makridakis and Wheelwright, 1989.
ANALYSIS: The baseline prediction (or persistence) for the dataset resulted in an RMSE of 3186.501. The manually configured model was simplified to ARIMA(1,1,1) and produced an RMSE of 956.958, which is dramatically better than the persistence RMSE of 3186.501. After applying the grid search technique to the dataset, the final RMSE of the ARIMA(0,0,1) model was 939.464, which is slightly lower than the manually configured ARIMA from the previous section. This difference may or may not be statistically significant. At the end, we selected ARIMA(0,0,1) as the final model.
CONCLUSION: The final RMSE for the validation period is predicted at 361 million sales. This is much better than the expectation of an error of a little more than 924 million sales per month. At this scale on the plot, the 12 months of forecast sales figures look fantastic.
Dataset Used: Monthly Sales of French Champagne
Dataset ML Model: Time series forecast with numerical attributes
Dataset Reference: https://datamarket.com/data/set/22r5/perrin-freres-monthly-champagne-sales-millions-64-72#!ds=22r5&display=line
One potential source of performance benchmark: https://machinelearningmastery.com/time-series-forecast-study-python-monthly-sales-french-champagne/
The HTML formatted report can be found here on GitHub.