SUMMARY: This project aims to construct a text classification model using a neural network and document the end-to-end steps using a template. The IMDB Movie Sentiment dataset is a binary classification situation where we attempt to predict one of the two possible outcomes.
INTRODUCTION: This dataset contains 50,000 movie reviews extracted from IMDB. The researchers have annotated the tweets with labels (0 = negative, 1 = positive) to detect the reviews’ sentiment.
In this Take1 iteration, we will create a bag-of-words model to perform binary classification (positive or negative) for the Tweets. The Part A script will focus on building the model with the training and validation datasets due to memory capacity constraints. Part B will focus on testing the model with the training and test datasets.
ANALYSIS: In this Take1 iteration, the preliminary model’s performance achieved an average accuracy score of 88.80% after 25 epochs with ten iterations of cross-validation. Furthermore, the final model processed the test dataset with an accuracy measurement of 89.48%.
CONCLUSION: In this iteration, the bag-of-words TensorFlow model appeared to be suitable for modeling this dataset. We should consider experimenting with TensorFlow for further modeling.
Dataset Used: IMDB Movie Sentiment
Dataset ML Model: Binary class text classification with text-oriented features
One potential source of performance benchmarks: https://www.kaggle.com/columbine/imdb-dataset-sentiment-analysis-in-csv-format
The HTML formatted report can be found here on GitHub.