Methodology Credit: Re-produced and adapted from a tutorial made available by Anish Singh Walia, Text Message Classification.
Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.
Data Set Description: https://www.kaggle.com/uciml/sms-spam-collection-dataset
Original Reference: http://www.dt.fee.unicamp.br/~tiago/smsspamcollection/
Modeling Approach: binary classification
The SMS Spam Collection is a set of SMS tagged messages that have been collected for SMS Spam research. It contains one set of SMS messages in English of 5,574 messages, tagged according to being ham (legitimate) or spam.
Working through machine learning problems from end-to-end requires a structured modeling approach. Working problems through a project template can encourage you to think about the problem more critically, to challenge your assumptions, and to get good at all parts of a modeling project.
Any predictive modeling machine learning project can be broken down into about 6 common tasks:
- Define Problem
- Summarize Data (Use the word cloud visualization technique for this project)
- Prepare Data (Not required for this project)
- Evaluate Algorithms (Use Naive Bayes classifier and measure accuracy)
- Improve Accuracy or Results
- Finalize Model and Present Results
The HTML formatted report can be found here on GitHub.