Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.
SUMMARY: This project aims to construct a predictive model using various machine learning algorithms and document the end-to-end steps using a template. The Planet: Understanding the Amazon from Space dataset is a multi-label classification situation where we attempt to predict one of several (more than two) possible outcomes.
INTRODUCTION: Planet, designer and builder of the world’s largest constellation of Earth-imaging satellites collaborated with its Brazilian partner SCCON in challenging Kaggle participants to label satellite image chips with atmospheric conditions and various classes of land cover/land use. The resulting models will help the global community better understand deforestation conditions and how to respond to them.
The purpose of this modeling exercise is to construct an end-to-end template for solving multi-label machine learning problems. The series of scripting exercises will replicate Dr. Jason Brownlee’s blog post on this topic to build a robust template for future similar problems.
From iteration Take1, we constructed the necessary script segments to download and pre-process the image files available on Kaggle’s website.
From iteration Take2, we constructed the necessary script segments to train the TensorFlow model and evaluated the model’s effectiveness.
In this Take3 iteration, we will construct the necessary script segments to load an unseen image and perform prediction on the image.
ANALYSIS: From iteration Take1, we could successfully download and pre-process the image files from Kaggle.
From iteration Take2, the performance of the baseline model achieved a fbeta score of 0.8478 after 20 epochs using the validation dataset.
In this Take3 iteration, we could successfully download an image and make a prediction on the previously unseen photo.
CONCLUSION: In this iteration, the TensorFlow model appeared to be suitable for modeling this dataset. We should consider experimenting with TensorFlow for further modeling.
Dataset Used: Planet: Understanding the Amazon from Space
Dataset ML Model: Multi-label classification with numerical attributes
Dataset Reference: https://www.kaggle.com/c/planet-understanding-the-amazon-from-space/data
One potential source of performance benchmarks: https://www.kaggle.com/c/planet-understanding-the-amazon-from-space/leaderboard
The HTML formatted report can be found here on GitHub.