Univariate Time Series Model for Iron Production in American River River-flow

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: The project aims to construct a time series prediction model and document the end-to-end steps using a template. The American River River-flow dataset is a univariate time series situation where we attempt to forecast future outcomes based on past data points.

INTRODUCTION: The problem is to forecast the monthly river flow for American River at Fair Oaks, California. The dataset describes a time series of flow volume (in cms) over 55 years (1906-1960), and there are 660 observations. We used the first 80% of the observations for training and testing various models while holding back the remaining observations for validating the final model.

ANALYSIS: The baseline persistence model yielded an RMSE of 112.477. The MLP model processed the same test data with an RMSE of 88.340, which was better than the baseline model as expected. In an earlier ARIMA modeling experiment, the best ARIMA model with non-seasonal order of (1, 0, 0) and seasonal order of (1, 0, 1, 12) processed the validation data with an RMSE of 78.413.

CONCLUSION: For this dataset, the TensorFlow MLP model achieved an acceptable result, and we should consider using TensorFlow for further modeling.

Dataset Used: Monthly river flow in cms, American River at Fair Oaks, California, October 1906 through September 1960

Dataset ML Model: Time series forecast with numerical attribute

Dataset Reference: Rob Hyndman and Yangzhuoran Yang (2018). tsdl: Time Series Data Library. v0.1.0. https://pkg.yangzhuoranyang./tsdl/.

The HTML formatted report can be found here on GitHub.