SUMMARY: This project aims to construct a predictive model using various machine learning algorithms and document the end-to-end steps using a template. The CycleGAN Apple vs. Orange dataset is a binary classification situation where we attempt to predict one of the two possible outcomes.

INTRODUCTION: The CycleGAN dataset collection contains datasets that consist of images from two classes A and B (for example, apple vs. orange, horses vs. zebras, and so on). The researchers used the images to train machine learning models for research work in the area of General Adversarial Networks.

In iteration Take1, we constructed and tuned machine learning models for this dataset using TensorFlow with a simple VGG-1 network. We also observed the best result that we could obtain using the test dataset.

This Take2 iteration will construct and tune machine learning models for this dataset using TensorFlow with a VGG-2 network. We will also observe the best result that we can obtain using the test dataset.

ANALYSIS: In iteration Take1, the baseline model’s (one layer with 16 convolutional filters) performance achieved an accuracy score of 48.25% after 15 epochs using the unseen test dataset. After experimenting with different layer configurations, the best model (one layer with 32 convolutional filters) processed the test dataset with 91.44% accuracy.

In this Take2 iteration, the baseline model’s (two layers with 8/16 convolutional filters) performance achieved an accuracy score of 92.22% after 15 epochs using the unseen test dataset. After experimenting with different layer configurations, the best model (two layers with 64/128 convolutional filters) processed the test dataset with 92.61% accuracy.

CONCLUSION: In this iteration, the best TensorFlow model appeared to be suitable for modeling this dataset. We should consider experimenting with TensorFlow for further modeling.

Dataset Used: CycleGAN Apple vs. Orange Dataset

Dataset ML Model: Binary classification with numerical attributes

Dataset Reference: https://people.eecs.berkeley.edu/%7Etaesung_park/CycleGAN/datasets/

One potential source of performance benchmarks: https://arxiv.org/abs/1703.10593 or https://junyanz.github.io/CycleGAN/

The HTML formatted report can be found here on GitHub.

]]>SUMMARY: This project aims to construct a predictive model using various machine learning algorithms and document the end-to-end steps using a template. The CycleGAN Apple vs. Orange dataset is a binary classification situation where we attempt to predict one of the two possible outcomes.

INTRODUCTION: The CycleGAN dataset collection contains datasets that consist of images from two classes A and B (for example, apple vs. orange, horses vs. zebras, and so on). The researchers used the images to train machine learning models for research work in the area of General Adversarial Networks.

This Take1 iteration will construct and tune machine learning models for this dataset using TensorFlow with a simple VGG-1 network. We will also observe the best result that we can obtain using the test dataset. The final output from this iteration will become our baseline performance level for future iterations.

ANALYSIS: The baseline model’s (one layer with 16 convolutional filters) performance achieved an accuracy score of 48.25% after 15 epochs using the unseen test dataset. After experimenting with different layer configurations, the best model (one layer with 32 convolutional filters) processed the test dataset with 91.44% accuracy.

CONCLUSION: In this iteration, the best TensorFlow model appeared to be suitable for modeling this dataset. We should consider experimenting with TensorFlow for further modeling.

Dataset Used: CycleGAN Apple vs. Orange Dataset

Dataset ML Model: Binary classification with numerical attributes

Dataset Reference: https://people.eecs.berkeley.edu/%7Etaesung_park/CycleGAN/datasets/

One potential source of performance benchmarks: https://arxiv.org/abs/1703.10593 or https://junyanz.github.io/CycleGAN/

The HTML formatted report can be found here on GitHub.

]]>SUMMARY: This project aims to construct a predictive model using various machine learning algorithms and document the end-to-end steps using a template. The MiniBooNE Particle Identification dataset is a binary classification situation where we attempt to predict one of the two possible outcomes.

INTRODUCTION: This dataset is taken from the MiniBooNE experiment and is used to distinguish electron neutrinos (signal) from muon neutrinos (background). The researchers set up the data file as follows. The first line is the number of signal events followed by the number of background events. The records with the signal events come first, followed by the background events. Each line, after the first line, has the 50 particle ID variables for one event.

ANALYSIS: In another TensorFlow modeling exercise, the baseline model (2 layers with 32 nodes each) achieved an accuracy score of 95.17% after 20 epochs using the training dataset. After tuning the hyperparameters, the best model (2 layers with 512 nodes each) processed the validation dataset with an accuracy score of 97.88%. Furthermore, the final model processed the previously unseen test dataset with an accuracy score of 94.40%.

After a series of modeling trials, the best AutoKeras model (2 layers with 256 and 32 nodes) processed the validation dataset with a maximum accuracy score of 94.64%. When we applied the AutoKeras model to the previously unseen test dataset, we obtained an accuracy score of 94.54%.

CONCLUSION: In this iteration, the best TensorFlow model generated by AutoKeras appeared to be suitable for modeling this dataset. We should consider experimenting with AutoKeras for further modeling.

Dataset Used: MiniBooNE Particle Identification Dataset

Dataset ML Model: Binary classification with numerical attributes

Dataset Reference: https://archive.ics.uci.edu/ml/datasets/MiniBooNE+particle+identification

The HTML formatted report can be found here on GitHub.

]]>SUMMARY: This project aims to construct a predictive model using various machine learning algorithms and document the end-to-end steps using a template. The MiniBooNE Particle Identification dataset is a binary classification situation where we attempt to predict one of the two possible outcomes.

INTRODUCTION: This dataset is taken from the MiniBooNE experiment and is used to distinguish electron neutrinos (signal) from muon neutrinos (background). The researchers set up the data file as follows. The first line is the number of signal events followed by the number of background events. The records with the signal events come first, followed by the background events. Each line, after the first line, has the 50 particle ID variables for one event.

ANALYSIS: The baseline model (2 layers with 32 nodes each) achieved an accuracy score of 95.17% after 20 epochs using the training dataset. After tuning the hyperparameters, the best model (2 layers with 512 nodes each) processed the validation dataset with an accuracy score of 97.88%. Furthermore, the final model processed the previously unseen test dataset with an accuracy score of 94.40%.

CONCLUSION: In this iteration, the best TensorFlow model appeared to be suitable for modeling this dataset. We should consider experimenting with TensorFlow for further modeling.

Dataset Used: MiniBooNE Particle Identification Dataset

Dataset ML Model: Binary classification with numerical attributes

Dataset Reference: https://archive.ics.uci.edu/ml/datasets/MiniBooNE+particle+identification

The HTML formatted report can be found here on GitHub.

]]>SUMMARY: This project aims to construct a time series prediction model and document the end-to-end steps using a template. The Birmingham Parking Occupancy dataset is a time series situation where we are trying to forecast future outcomes based on past data points.

INTRODUCTION: The problem is to forecast the hourly number of parking occupancy for a parking facility in Birmingham. The dataset describes a time-series of parking occupancy over three months between October 2016 and December 2016, and there are 1834 hourly observations. We used the first 90% of the observations for training various models while holding back the remaining observations for validating the final model.

In this Part 1 iteration, we will train and validate the model using just one facility, BHMBCCMKT01, within the dataset.

ANALYSIS: The baseline prediction (or persistence) for the dataset resulted in an RMSE of 46. After performing a grid search for the most optimal ARIMA parameters, the final ARIMA non-seasonal order was (2, 0, 1) with the seasonal order (2, 0, 0, 24). Furthermore, the chosen model processed the validation data with an RMSE of 22, which was better than the baseline model as expected.

CONCLUSION: For this dataset, the chosen ARIMA model achieved a satisfactory result, and we should consider using ARIMA for further modeling.

Dataset Used: Parking Birmingham Data Set

Dataset ML Model: Time series forecast with numerical attribute

Dataset Reference: https://archive.ics.uci.edu/ml/datasets/Parking+Birmingham

The HTML formatted report can be found here on GitHub.

]]>Thanks to Dr. Jason Brownlee’s suggestions on creating a machine learning template, I have pulled together a project template that can be used to support modeling classification and regression problems using Python and the Kera framework.

Version 8 of the TensorFlow deep learning templates contain minor adjustments and corrections to the model’s prevision version. The updated templates also include support for the following:

- Scikit-learn’s ColumnTransformer, imputing, and pipeline utilities for feature scaling and transformation tasks

You will find the Python deep learning templates on the Machine Learning Project Templates page.

]]>Thanks to Dr. Jason Brownlee’s suggestions on creating a machine learning template, I have pulled together a set of project templates that I use to experiment with modeling ML problems using Python and AutoKeras.

Version 2 of the AutoKeras templates contain minor adjustments and corrections to the prevision version of the template. The updated templates also include:

- Scikit-learn’s ColumnTransformer, imputing, and pipeline utilities for feature scaling and transformation tasks

You will find the Python templates on the Machine Learning Project Templates page.

]]>SUMMARY: The purpose of this project is to construct a predictive model using various machine learning algorithms and to document the end-to-end steps using a template. The Wine Quality dataset is a regression situation where we are trying to predict the value of a continuous variable.

INTRODUCTION: The dataset is related to the white variants of the Portuguese “Vinho Verde” wine. The problem is to predict the wine quality using the chemical characteristics of the wine. Due to privacy and logistic issues, only physicochemical (inputs) and sensory (the output) variables are available (e.g., there is no data about grape types, wine brand, wine selling price).

ANALYSIS: In another iteration of modeling with TensorFlow, the performance of the preliminary model achieved an RMSE of 0.726. After tuning the hyperparameters, the best model processed the training dataset with an RMSE of 0.714. Furthermore, the final model processed the test dataset with an RMSE of 0.693.

After a series of modeling trials, the AutoKeras system processed the validation dataset with a minimum RMSE score of 0.562. When we applied the best AutoKeras model to the previously unseen test dataset, we obtained an RMSE score of 0.623.

CONCLUSION: In this iteration, the best TensorFlow model generated by AutoKeras appeared to be suitable for modeling this dataset. We should consider experimenting with AutoKeras for further modeling.

Dataset Used: Wine Quality Data Set

Dataset ML Model: Regression with numerical attributes

Dataset Reference: https://archive.ics.uci.edu/ml/datasets/wine+quality

The HTML formatted report can be found here on GitHub.

]]>SUMMARY: The purpose of this project is to construct a predictive model using various machine learning algorithms and to document the end-to-end steps using a template. The Wine Quality dataset is a regression situation where we are trying to predict the value of a continuous variable.

INTRODUCTION: The dataset is related to the white variants of the Portuguese “Vinho Verde” wine. The problem is to predict the wine quality using the chemical characteristics of the wine. Due to privacy and logistic issues, only physicochemical (inputs) and sensory (the output) variables are available (e.g., there is no data about grape types, wine brand, wine selling price).

ANALYSIS: In another iteration of modeling with TensorFlow, the performance of the preliminary model achieved an RMSE of 0.663. After tuning the hyperparameters, the best model processed the training dataset with an RMSE of 0.643. Furthermore, the final model processed the test dataset with an RMSE of 0.679.

After a series of modeling trials, the AutoKeras system processed the validation dataset with a minimum RMSE score of 0.386. When we applied the best AutoKeras model to the previously unseen test dataset, we obtained an RMSE score of 0.602.

CONCLUSION: In this iteration, the best TensorFlow model generated by AutoKeras appeared to be suitable for modeling this dataset. We should consider experimenting with AutoKeras for further modeling.

Dataset Used: Wine Quality Data Set

Dataset ML Model: Regression with numerical attributes

Dataset Reference: https://archive.ics.uci.edu/ml/datasets/wine+quality

The HTML formatted report can be found here on GitHub.

]]>Thanks to Dr. Jason Brownlee’s suggestions on creating a machine learning template, I have pulled together a set of project templates that I use to experiment with modeling ML problems using Python and XGBoost.

Version 2 of the XGBoost templates contain minor adjustments and corrections to the prevision version of the template. The updated templates also include:

- Scikit-learn’s ColumnTransformer, imputing, and pipeline utilities for feature scaling and transformation tasks

You will find the Python templates on the Machine Learning Project Templates page.

]]>