Multi-Class Classification Model for Crop Mapping in Canada Using Scikit-Learn Take 1

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: The purpose of this project is to construct a predictive model using various machine learning algorithms and to document the end-to-end steps using a template. The Crop Mapping in Canada dataset is a multi-class classification situation where we are trying to predict one of several (more than two) possible outcomes.

INTRODUCTION: This data set is a fused bi-temporal optical-radar data for cropland classification. The organization collected the images using RapidEye satellites (optical) and the Unmanned Aerial Vehicle Synthetic Aperture Radar (UAVSAR) system (Radar) over an agricultural region near Winnipeg, Manitoba, Canada in 2012. There are 2 * 49 radar features and 2 * 38 optical features for two dates: 05 and 14 July 2012. Seven crop type classes exist for this data set as follows: 1-Corn; 2-Peas; 3-Canola; 4-Soybeans; 5-Oats; 6-Wheat; and 7-Broadleaf.

In this Take1 iteration, we will construct and tune machine learning models for this dataset using the Scikit-Learn library. We will observe the best accuracy result that we can obtain using the tuned models with the training and test datasets.

ANALYSIS: From this Take1 iteration, the performance of the machine learning algorithms achieved a baseline average accuracy of 99.24%. Two algorithms (Extra Trees and k-Nearest Neighbors) achieved the top accuracy metrics after the first round of modeling. After a series of tuning trials, the Extra Trees model processed the training dataset with an accuracy score of 99.72%. When configured with the optimized parameters, the Extra Trees model processed the test dataset with an accuracy score of 99.74%, which was consistent with the prediction accuracy from the training dataset.

CONCLUSION: For this iteration, the Extra Trees model achieved the best overall results using the training and test datasets. For this dataset, we should consider using the Extra Trees algorithm for further modeling.

Dataset Used: Crop Mapping in Canada Data Set

Dataset ML Model: Multi-Class classification with numerical attributes

Dataset Reference: https://archive.ics.uci.edu/ml/datasets/Crop+mapping+using+fused+optical-radar+data+set

The HTML formatted report can be found here on GitHub.