Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.
SUMMARY: This project aims to construct a text classification model using a neural network and document the end-to-end steps using a template. The Sentiment Labelled Sentences dataset is a binary classification situation where we attempt to predict one of the two possible outcomes.
INTRODUCTION: This dataset was created for the research paper ‘From Group to Individual Labels using Deep Features,’ Kotzias et al., KDD 2015. The paper researchers randomly selected 500 positive and 500 negative sentences from a larger dataset of reviews for each website. The researcher also attempted to choose sentences with a positive or negative connotation as the goal was to avoid selecting neutral sentences.
From iteration Take1, we deployed a bag-of-words model to classify the Amazon dataset’s review comments. We also applied various sequence-to-matrix modes to evaluate the model’s performance.
In this Take2 iteration, we will deploy a word-embedding model to classify the Amazon dataset’s review comments. We will also apply various sequence-to-matrix modes to evaluate the model’s performance.
ANALYSIS: From iteration Take1, the bag-of-words model’s performance achieved an average accuracy score of 77.31% after 25 epochs with ten iterations of cross-validation. Furthermore, the final model processed the test dataset with an accuracy measurement of 71.00%.
In this Take2 iteration, the word-embedding model’s performance achieved an average accuracy score of 73.25% after 25 epochs with ten iterations of cross-validation. Furthermore, the final model processed the test dataset with an accuracy measurement of 67.00%.
CONCLUSION: In this modeling iteration, the word-embedding TensorFlow model did not do as well as the bag-of-words model. However, we should continue to experiment with both natural language processing techniques for further modeling.
Dataset Used: Sentiment Labelled Sentences
Dataset ML Model: Binary class text classification with text-oriented features
Dataset Reference: https://archive.ics.uci.edu/ml/datasets/Sentiment+Labelled+Sentences
The HTML formatted report can be found here on GitHub.