Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.
SUMMARY: The purpose of this project is to construct a time series prediction model and document the end-to-end steps using a template. The Yearly Copper Prices dataset is a time series situation where we are trying to forecast future outcomes based on past data points.
INTRODUCTION: The problem is to forecast the annual price of copper using the value of dollars in 1997 as the basis. The dataset describes a time-series of copper prices per ton (in dollars) over 198 years (1800-1996), and there are 197 observations. We used the first 80% of the observations for training various models while holding back the remaining observations for validating the final model.
ANALYSIS: The baseline prediction (or persistence) for the dataset resulted in an RMSE of 22.057. After performing a grid search for the most optimal ARIMA parameters, the final ARIMA non-seasonal order was (4, 1, 4). Furthermore, the chosen model processed the validation data with an RMSE of 21.456, which was better than the baseline model as expected.
CONCLUSION: For this dataset, the chosen ARIMA model achieved a satisfactory result, and we should consider using the algorithm for further modeling.
Dataset Used: Yearly Copper Prices 1800 through 1996
Dataset ML Model: Time series forecast with numerical attributes
Dataset Reference: Rob Hyndman and Yangzhuoran Yang (2018). tsdl: Time Series Data Library. v0.1.0. https://pkg.yangzhuoranyang./tsdl/
The HTML formatted report can be found here on GitHub.