Image Regression Model for MNIST Handwritten Digits Using Python and AutoKeras

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: This project aims to construct a predictive model using various machine learning algorithms and document the end-to-end steps using a template. The MNIST Handwritten Digits dataset is an image classification situation where we attempt to predict one of several (more than two) possible outcomes.

INTRODUCTION: The MNIST problem is a dataset developed by Yann LeCun, Corinna Cortes, and Christopher Burges for evaluating machine learning models on the handwritten digit classification problem. The dataset was constructed from many scanned document datasets available from the National Institute of Standards and Technology (NIST). Each image is a 28 by 28-pixel square (784 pixels total). A standard split of the dataset is used to evaluate and compare models, where 60,000 images are used to train a model, and a separate set of 10,000 images are used to test it. It is a digit recognition task, so there are ten classes (0 to 9) to predict.

ANALYSIS: Previously, we modeled the dataset using AutoKeras’ image classifier, and the system processed the validation dataset with an accuracy score of 94.84%. When we applied the best AutoKeras model to the previously unseen test dataset, we obtained an accuracy score of 98.4%.

After a series of modeling trials in this iteration, the AutoKeras’ image regressor system processed the test dataset with an RMSE score of 0.454 and an R2 score of 97.53%. When we applied the same predictions to the classification metrics, we obtained an accuracy score of 96.47%.

CONCLUSION: In this iteration, the best TensorFlow model generated by AutoKeras appeared to be suitable for modeling this dataset. We should consider experimenting with AutoKeras for further modeling.

Dataset Used: MNIST Handwritten Digits Dataset

Dataset ML Model: Image regression modeling with numerical attributes

Dataset Reference:

One potential source of performance benchmark:

The HTML formatted report can be found here on GitHub.