Time Series Model for Metro Bus Ridership Using Python and ARIMA

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: The purpose of this project is to construct a time series prediction model and document the end-to-end steps using a template. The Metro Bus Ridership dataset is a time series situation where we are trying to forecast future outcomes based on past data points.

INTRODUCTION: The problem is to forecast the monthly number of bus riders for the Los Angeles County Metro district. The dataset describes a time-series of bus riders between January 2009 and June 2020, and there are 138 observations. We used the first 80% of the observations for training various models while holding back the remaining observations for validating the final model.

ANALYSIS: The baseline prediction (or persistence) for the dataset resulted in an RMSE of 2.480 million. After performing a grid search for the most optimal ARIMA parameters, the final ARIMA non-seasonal order was (4, 1, 2) with the seasonal order being (1, 0, 2, 12). Furthermore, the chosen model processed the validation data with an RMSE of 2.397 million, which was just slightly better than the baseline model.

CONCLUSION: For this dataset, the chosen ARIMA model achieved a satisfactory result, and we should consider using the algorithm for further modeling.

Dataset Used: Metro Interactive Estimated Ridership Stats

Dataset ML Model: Time series forecast with numerical attribute

Dataset Reference: http://isotp.metro.net/MetroRidership/Index.aspx

The HTML formatted report can be found here on GitHub.