Binary Classification Model for MiniBooNE Particle Identification Using Python Take 5

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: The purpose of this project is to construct a predictive model using various machine learning algorithms and to document the end-to-end steps using a template. The MiniBooNE Particle Identification dataset is a binary classification situation where we are trying to predict one of the two possible outcomes.

INTRODUCTION: This dataset is taken from the MiniBooNE experiment and is used to distinguish electron neutrinos (signal) from muon neutrinos (background). The data file is set up as follows. The first line is the number of signal events followed by the number of background events. The records with the signal events come first, followed by the background events. Each line, after the first line, has the 50 particle ID variables for one event.

ANALYSIS: The baseline performance of the machine learning algorithms achieved an average accuracy of 91.74%. Two algorithms, k-Nearest Neighbors and eXtreme Gradient Boosting (XGBoost), achieved the top accuracy metrics after the first round of modeling. After a series of tuning trials, XGBoost turned in the top overall result and achieved an accuracy metric of 94.22%. By using the optimized parameters, the XGBoost algorithm processed the test dataset with an accuracy of 94.31%, which was consistent with the prediction performance from the training dataset.

CONCLUSION: For this iteration, the XGBoost algorithm achieved the best overall results using the training and test datasets. For this dataset, XGBoost should be considered for further modeling.

Dataset Used: MiniBooNE Particle Identification Data Set

Dataset ML Model: Binary classification with numerical attributes

Dataset Reference: https://archive.ics.uci.edu/ml/datasets/MiniBooNE+particle+identification

The HTML formatted report can be found here on GitHub.