Regression Model for Superconductor Critical Temperature Using XGBoost Take 2

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: The purpose of this project is to construct a predictive model using various machine learning algorithms and to document the end-to-end steps using a template. The Superconductor Critical Temperature dataset is a regression situation where we are trying to predict the value of a continuous variable.

INTRODUCTION: The research team wishes to create a statistical model for predicting the superconducting critical temperature based on the features extracted from the superconductor’s chemical formula. The model seeks to examine the features that can contribute the most to the model’s predictive accuracy.

From previous iterations, we constructed and tuned several classic machine learning models using the Scikit-Learn library. We also observed the best results that we could obtain from the models.

From iteration Take1, we constructed and tuned an XGBoost model. Furthermore, we applied the XGBoost model to a test dataset and observed the best result that we could obtain from the model.

In this Take2 iteration, we will construct and tune an XGBoost model using the additional material attributes available for modeling. Furthermore, we will apply the XGBoost model to a test dataset and observe the best result that we can obtain from the model.

ANALYSIS: From previous iterations, the Extra Trees model turned in the best overall result and achieved an RMSE metric of 9.56. By using the optimized parameters, the Extra Trees algorithm processed the test dataset with an RMSE of 9.32.

From iteration Take1, the baseline performance of the XGBoost algorithm achieved an RMSE benchmark of 12.88. After a series of tuning trials, the XGBoost model processed the validation dataset with an RMSE score of 9.88. When we applied the XGBoost model to the previously unseen test dataset, we obtained an RMSE score of 9.06.

In this Take2 iteration, the baseline performance of the XGBoost algorithm achieved an RMSE benchmark of 12.54. After a series of tuning trials, the XGBoost model processed the validation dataset with an RMSE score of 9.58. When we applied the XGBoost model to the previously unseen test dataset, we obtained an RMSE score of 8.94.

CONCLUSION: In this iteration, the additional material attributes improved the XGBoost model further for modeling this dataset. We should consider using the algorithm for further modeling.

Dataset Used: Superconductivity Data Set

Dataset ML Model: Regression with numerical attributes

Dataset Reference: https://archive.ics.uci.edu/ml/datasets/Superconductivty+Data

One potential source of performance benchmarks: https://doi.org/10.1016/j.commatsci.2018.07.052

The HTML formatted report can be found here on GitHub.