Regression Deep Learning Model for NCAA Women’s Volleyball Win-Loss Percentages Using TensorFlow

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: The purpose of this project is to construct a predictive model using various machine learning algorithms and to document the end-to-end steps using a template. The NCAA Women’s Volleyball dataset is a regression situation where we are trying to predict the value of a continuous variable.

INTRODUCTION: NCAA maintains and publishes numerous datasets on its sporting events and statistics. The goal of this exercise is to experiment with the non-neural-network machine learning (ML) algorithms and observe whether we can use those classic ML techniques to model the sport of volleyball.

ANALYSIS: The baseline model with a single layer of 8 nodes processed the test dataset and produced an RMSE of 0.0797 after 200 epochs. The alternate model with a single layer of 12 nodes processed the test dataset and yielded an RMSE of 0.0667 after 200 epochs. Other model architectures with more layers did not significantly improve the baseline model’s result.

CONCLUSION: For this iteration, the baseline model with a single layer of 12 nodes appeared to have yielded the best result. For this dataset, we should consider experimenting with more and different MLP models.

Dataset Used: NCAA Women’s Volleyball Archived Statistics

Dataset ML Model: Regression with numerical attributes

Dataset Reference: http://web1.ncaa.org/stats/StatsSrv/rankings?doWhat=archive&sportCode=WVB

The HTML formatted report can be found here on GitHub.