Time Series Model for Chicago Live Births in the USA Using Python and ARIMA

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: This project aims to construct a time series prediction model and document the end-to-end steps using a template. Live Births in the United States dataset is a time series situation where we attempt to forecast future outcomes based on past data points.

INTRODUCTION: The United Nations Statistics Division collects, compiles, and disseminates official demographic and social statistics on various topics. The Demographic Yearbook provides statistics on population size and composition, births, deaths, marriage, and divorce rates annually. The problem is to forecast the monthly number of live births in the United States. The dataset describes a time-series of individuals over 47 years (1969-2015), and there are 564 observations. We used the first 90% of the instances for training various models while holding back the remaining data for validating the final model.

ANALYSIS: The baseline prediction (or persistence) for the dataset resulted in an RMSE of 16735. After performing a grid search for the most optimal ARIMA parameters, the final ARIMA non-seasonal order was (3, 1, 4) with the seasonal order (2, 0, 2, 12). Furthermore, the chosen model processed the validation data with an RMSE of 7177, which was better than the baseline model as expected.

CONCLUSION: For this dataset, the chosen ARIMA model achieved a satisfactory result and should be considered for further modeling.

Dataset Used: Live births by month of birth | Demographic Statistics Database | United Nations Statistics Division

Dataset ML Model: Time series forecast with numerical attribute

Dataset Reference: https://data.un.org/Data.aspx?d=POP&f=tableCode:55

The HTML formatted report can be found here on GitHub.