Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.
For more information on this case study project, please consult Dr. Brownlee’s blog post at https://machinelearningmastery.com/standard-machine-learning-datasets/.
Dataset Used: Pima Indians Diabetes Database
Data Set ML Model: Classification with numerical attributes
Dataset Reference: https://www.kaggle.com/uciml/pima-indians-diabetes-database
For more information on performance benchmarks, please consult: https://www.kaggle.com/uciml/pima-indians-diabetes-database
INTRODUCTION: The Pima Indians Diabetes Dataset involves predicting the onset of diabetes within 5 years in Pima Indians given medical details. It is a binary (2-class) classification problem. There are 768 observations with 8 input variables and 1 output variable. Missing values are believed to be encoded with zero values.
CONCLUSION: The baseline performance of predicting the class variable achieved an average accuracy of 75.85%. The top accuracy result achieved via Logistic Regression was 77.73% after a series of tuning trials. The ensemble algorithms, in this case, did not yield a better result than the non-ensemble algorithms to justify the additional processing required.
The HTML formatted report can be found here on GitHub.