Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.
SUMMARY: This project aims to construct a predictive model using a TensorFlow convolutional neural network (CNN) and document the end-to-end steps using a template. The Cassava Leaf Disease dataset is a multi-class classification situation where we attempt to predict one of several (more than two) possible outcomes.
INTRODUCTION: As the second-largest provider of carbohydrates in Africa, cassava is an essential food security crop grown by smallholder farmers because it can withstand harsh conditions. Existing disease detection methods require farmers to solicit government-funded agricultural experts’ help to visually inspect and diagnose the plants. This method suffers from being labor-intensive, low-supply, and costly.
The research team compiled a dataset of 21,367 labeled images collected during a regular survey in Uganda to address the problem. Most pictures were crowdsourced from farmers taking photos of their gardens and annotated by experts at the National Crops Resources Research Institute (NaCRRI) in collaboration with the AI lab at Makerere University, Kampala. Our task is to classify each cassava image into four disease categories or a fifth category indicating a healthy leaf.
From iteration Take1, we constructed a CNN model using the InceptionV3 architecture and tested the model’s performance using cross-validation. Also, we submitted our model to Kaggle and tested the model’s performance using Kaggle’s test images.
In this Take2 iteration, we will construct a CNN model using the ResNet50V2 architecture and test the model’s performance using cross-validation. Also, we will submit our model to Kaggle and test the model’s performance using Kaggle’s test images.
ANALYSIS: From iteration Take1, the model’s performance achieved an average accuracy score of 67.17% on the validation dataset after 30 epochs. Furthermore, the final model processed Kaggle’s test dataset with an accuracy measurement of 61.25%.
In this Take2 iteration, the model’s performance achieved an average accuracy score of 61.86% on the validation dataset after 30 epochs. Furthermore, the final model processed Kaggle’s test dataset with an accuracy measurement of 61.28%.
CONCLUSION: In this iteration, the ResNet50V2 TensorFlow CNN model appeared to be suitable for modeling this dataset. We should consider experimenting with TensorFlow for further modeling.
Dataset Used: Cassava Leaf Disease Classification
Dataset ML Model: Multi-class image classification with numerical attributes
Dataset Reference: https://www.kaggle.com/c/cassava-leaf-disease-classification/
One potential source of performance benchmarks: https://www.kaggle.com/c/cassava-leaf-disease-classification/leaderboard
The HTML formatted report can be found here on GitHub.