Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.
SUMMARY: This project aims to construct a predictive model using a TensorFlow convolutional neural network (CNN) and document the end-to-end steps using a template. The Colorectal Cancer Histology dataset is a multi-class classification situation where we attempt to predict one of several (more than two) possible outcomes.
From iteration Take1, we constructed a few simple three-layer CNN’s to model the dataset. We plan to use the best result as the baseline model for future modeling iterations.
From iteration Take2, we constructed a simple VGG convolutional network with six VGG blocks to model the dataset. We also compared the best result from the VGG network with the baseline model from iteration Take1.
In this Take3 iteration, we will construct a VGG-16 convolutional network to model the dataset. We will compare the best result from the network with the baseline model from iteration Take1.
INTRODUCTION: This data set represents a collection of textures in histological images of human colorectal cancer. All images are RGB, 0.495 µm per pixel, digitized with an Aperio ScanScope (Aperio/Leica biosystems), magnification 20x. The histological samples contain fully anonymized images of formalin-fixed paraffin-embedded human colorectal adenocarcinomas (primary tumors) from our pathology archive (Institute of Pathology, University Medical Center Mannheim, Heidelberg University, Mannheim, Germany).
ANALYSIS: From iteration Take1, the baseline model’s performance achieved an accuracy score of 98.25% after 15 epochs using the training dataset. After tuning the hyperparameters, the best model processed the validation dataset with an accuracy score of 84.00%.
From iteration Take2, the model’s performance achieved an accuracy score of 96.88% after 20 epochs using the training dataset. After tuning the hyperparameters, the best model processed the validation dataset with an accuracy score of 82.30%.
In this Take3 iteration, the model’s performance achieved an accuracy score of 99.75% after 30 epochs using the training dataset. After tuning the hyperparameters, the best model processed the validation dataset with an accuracy score of 83.90%.
CONCLUSION: In this iteration, the TensorFlow CNN model appeared to be suitable for modeling this dataset. We should consider experimenting with TensorFlow for further modeling.
Dataset Used: Colorectal Cancer Histology Dataset, Kather JN, Weis CA, Bianconi F, Melchers SM, Schad LR, Gaiser T, Marx A, Zollner F: Multi-class texture analysis in colorectal cancer histology (2016), Scientific Reports (in press)
Dataset ML Model: Multi-class image classification with numerical attributes
Dataset Reference: https://zenodo.org/record/53169#.XGZemKwzbmG
One potential source of performance benchmarks: https://www.kaggle.com/kmader/colorectal-histology-mnist
The HTML formatted report can be found here on GitHub.