Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.
SUMMARY: The purpose of this project is to construct a prediction model using various machine learning algorithms and to document the end-to-end steps using a template. The Human Activities and Postural Transitions dataset is a classic multi-class classification situation where we are trying to predict one of the 12 possible outcomes.
INTRODUCTION: The research team carried out experiments with a group of 30 volunteers who performed a protocol of activities composed of six basic activities. There are three static postures (standing, sitting, lying) and three dynamic activities (walking, walking downstairs and walking upstairs). The experiment also included postural transitions that occurred between the static postures. These are stand-to-sit, sit-to-stand, sit-to-lie, lie-to-sit, stand-to-lie, and lie-to-stand. All the participants were wearing a smartphone on the waist during the experiment execution. The research team also video-recorded the activities to label the data manually. The research team randomly partitioned the obtained data into two sets, 70% for the training data and 30% for the testing.
In the current iteration Take1, the script will focus on evaluating various machine learning algorithms and identifying the model that produces the best overall metrics. Because the dataset has many attributes that are collinear with other attributes, we will eliminate the attributes that have a collinearity measurement of 99% or higher. Iteration Take1 will establish the baseline performance for accuracy and processing time.
ANALYSIS: In the current iteration Take1, the baseline performance of the machine learning algorithms achieved an average accuracy of 89.80%. Two algorithms (Linear Discriminant Analysis and eXtreme Gradient Boosting) achieved the top accuracy metrics after the first round of modeling. After a series of tuning trials, eXtreme Gradient Boosting turned in the top overall result and achieved an accuracy metric of 98.59%. By using the optimized parameters, the eXtreme Gradient Boosting algorithm processed the testing dataset with an accuracy of 93.67%, which was below the training data and possibly due to over-fitting.
From the model-building perspective, the number of attributes decreased by 108, from 561 down to 453.
CONCLUSION: For this iteration, the eXtreme Gradient Boosting algorithm achieved the best overall results. For this dataset, we should consider using the eXtreme Gradient Boosting algorithm for further modeling or production use.
Dataset Used: Smartphone-Based Recognition of Human Activities and Postural Transitions Data Set
Dataset ML Model: Multi-class classification with numerical attributes
Dataset Reference: https://archive.ics.uci.edu/ml/datasets/Smartphone-Based+Recognition+of+Human+Activities+and+Postural+Transitions
The HTML formatted report can be found here on GitHub.