NLP Model for Disaster Tweets Classification Using TensorFlow Take 1

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: This project aims to construct a text classification model using a neural network and document the end-to-end steps using a template. The Disaster Tweets Classification dataset is a binary classification situation where we attempt to predict one of the two possible outcomes.

INTRODUCTION: Twitter has become an important communication channel in times of emergency. The ubiquitous nature of smartphones enables people to announce an emergency they are observing in real-time. Because of this, more agencies are interested in programmatically monitoring Twitter. In this practice Kaggle competition, we want to build a machine learning model that predicts which Tweets are about real disasters and which ones are not. This dataset was created by Figure-Eight and shared initially on their ‘Data for Everyone’ website.

In this Take1 iteration, we will deploy a bag-of-words model to classify the Tweets. We will also submit the test predictions to Kaggle and obtain the performance level of the model.

ANALYSIS: In this Take1 iteration, the bag-of-words model’s performance achieved an average accuracy score of 75.49% after 20 epochs with ten iterations of cross-validation. Furthermore, the final model processed the test dataset with an accuracy measurement of 75.02%.

CONCLUSION: In this modeling iteration, the bag-of-words TensorFlow model appeared to be suitable for modeling this dataset. We should consider experimenting with TensorFlow for further modeling.

Dataset Used: Sentiment Labelled Sentences

Dataset ML Model: Binary class text classification with text-oriented features

Dataset Reference:

The HTML formatted report can be found here on GitHub.