Time Series Model for Weekly Births in Quebec Using Python

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: The purpose of this project is to construct a time series prediction model and document the end-to-end steps using a template. The Daily Births in Quebec dataset is a time series situation where we are trying to forecast future outcomes based on past data points.

INTRODUCTION: The problem is to forecast the weekly number of births in the province of Quebec, Canada. The dataset describes a time-series of baby births for 14 years (1977-1990), and there are 5113 daily observations. To avoid out-of-memory issues during the processing, we first summarized the daily data into 730 weekly sums. We subsequently used the first 80% of the observations for training and testing various models while holding back the remaining observations for validating the final model.

ANALYSIS: The baseline prediction (or persistence) for the dataset resulted in an RMSE of 70. After performing a grid search for the most optimal ARIMA parameters, the final ARIMA non-seasonal order was (2, 1, 2) with the seasonal order being (1, 0, 2, 52). Furthermore, the chosen model processed the validation data with an RMSE of 59, which was better than the baseline model as expected.

Dataset Used: Monthly Sunspot Number in Zurich, January 1749 through December 1983

Dataset ML Model: Time series forecast with numerical attributes

Dataset Reference: Rob Hyndman and Yangzhuoran Yang (2018). tsdl: Time Series Data Library. v0.1.0. https://pkg.yangzhuoranyang./tsdl/.

The HTML formatted report can be found here on GitHub.