Time Series Model for Housing Starts in the USA Using Python and ARIMA

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: The purpose of this project is to construct a time series prediction model and document the end-to-end steps using a template. The Housing Starts in the USA dataset is a time series situation where we are trying to forecast future outcomes based on past data points.

INTRODUCTION: The problem is to forecast the monthly total number of housing starts in the USA. Housing start occurs when excavation begins for the footings or foundation of a building. The dataset describes a time-series of housing starts (thousands of units) over 30 years (1959-2020), and there are 739 observations. We used the first 80% of the observations for training various models while holding back the remaining observations for validating the final model.

ANALYSIS: The baseline prediction (or persistence) for the dataset resulted in an RMSE of 9.705. After performing a grid search for the most optimal ARIMA parameters, the final ARIMA non-seasonal order was (4, 0, 4) with the seasonal order being (1, 0, 2, 12). Furthermore, the chosen model processed the validation data with an RMSE of 8.763, which was better than the baseline model as expected.

CONCLUSION: For this dataset, the chosen ARIMA model achieved a satisfactory result, and we should consider using the algorithm for further modeling.

Dataset Used: Housing Starts: Total: New Privately Owned Housing Units Started, U.S. Census Bureau and U.S. Department of Housing and Urban Development, Housing Starts: Total: New Privately Owned Housing Units Started [HOUSTNSA], retrieved from FRED, Federal Reserve Bank of St. Louis; https://fred.stlouisfed.org/series/HOUSTNSA, August 23, 2020.

Dataset ML Model: Time series forecast with numerical attribute

The HTML formatted report can be found here on GitHub.