Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.
SUMMARY: The purpose of this project is to construct a time series prediction model and document the end-to-end steps using a template. The Daily Female Births dataset is a time series situation where we are trying to forecast future outcomes based on past data points.
INTRODUCTION: The problem is to predict daily births of female. The dataset describes the daily number of female births in California in 1959, and there are 365 observations. We used the first 80% of the observations for training and testing various models, while holding back the last 20% of the observations for validating the final model. The original dataset was credited to Newton (1988).
ANALYSIS: The baseline prediction (or persistence) for the dataset resulted in an RMSE of 7.69. After performing a grid search for the most optimal ARIMA parameters, the final ARIMA order was (1, 1, 1). Furthermore, the chosen model of (1, 1, 1) processed the validation data with an RMSE of 6.59.
CONCLUSION: For this dataset, the ARIMA model with the order of (1, 1, 1) achieved the best overall results and should be considered for further modeling.
Dataset Used: Daily Female Births in California in 1959
Dataset ML Model: Time series forecast with numerical attributes
Dataset Reference: https://datamarket.com/data/set/235k/daily-total-female-births-in-california-1959
The HTML formatted report can be found here on GitHub.