Multi-Class Deep Learning Model for MNIST Digits Using PyTorch

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: The purpose of this project is to construct a predictive model using various machine learning algorithms and to document the end-to-end steps using a template. The MNIST Database of handwritten digits is a multi-class classification situation where we are trying to predict one of several (more than two) possible outcomes.

Additional Notes: This is a replication, with some small modifications, of Dr. Jason Brownlee’s blog post, PyTorch Tutorial: How to Develop Deep Learning Models with Python ( I plan to leverage Dr. Brownlee’s tutorial code and build a PyTorch-based notebook template for future uses.

INTRODUCTION: The MNIST problem is a dataset developed by Yann LeCun, Corinna Cortes, and Christopher Burges for evaluating machine learning models on the handwritten digit classification problem. The dataset was constructed from many scanned document datasets available from the National Institute of Standards and Technology (NIST). The MNIST handwritten digit classification problem has become a standard dataset used in computer vision and deep learning.

Images of digits were taken from a variety of scanned documents, normalized in size and centered. Each image is a 28 by 28-pixel square (784 pixels total). A standard split of the dataset is used to evaluate and compare models, where 60,000 images are used to train a model, and a separate set of 10,000 images are used to test it. It is a digit recognition task, so there are ten classes (0 to 9) to predict.

ANALYSIS: After setting up the deep learning model, the model processed the test dataset with an accuracy measurement of 98.98%.

CONCLUSION: For this dataset, the model built using PyTorch achieved a satisfactory result and should be considered for future modeling activities.

Dataset Used: The MNIST Database of Handwritten Digits

Dataset ML Model: Multi-class classification with numerical attributes

Dataset Reference:

One potential source of performance benchmarks:

The HTML formatted report can be found here on GitHub.