Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.
SUMMARY: The purpose of this project is to construct a prediction model using various machine learning algorithms and to document the end-to-end steps using a template. The MiniBooNE Particle Identification dataset is a classic binary classification situation where we are trying to predict one of the two possible outcomes.
INTRODUCTION: This dataset is taken from the MiniBooNE experiment and is used to distinguish electron neutrinos (signal) from muon neutrinos (background). The data file is set up as follows. In the first line is the number of signal events followed by the number of background events. The records with the signal events come first, followed by the background events. Each line, after the first line, has the 50 particle ID variables for one event.
ANALYSIS: The baseline performance of the eight algorithms achieved an average accuracy of 90.82%. Two algorithms (Bagged CART and Random Forest) achieved the top accuracy scores after the first round of modeling. After a series of tuning trials, Random Forest turned in the top result using the training data. It achieved an average accuracy of 93.74%. By optimizing the tuning parameters, the Random Forest algorithm processed the testing dataset with an accuracy of 93.91%, which was even better than the training data.
CONCLUSION: For this iteration, the Random Forest algorithm achieved the best overall results using the training and testing datasets. For this dataset, the Random Forest algorithm should be considered for further modeling or production use.
Dataset Used: MiniBooNE particle identification Data Set
Dataset ML Model: Binary classification with numerical attributes
Dataset Reference: https://archive.ics.uci.edu/ml/datasets/MiniBooNE+particle+identification
The HTML formatted report can be found here on GitHub.