SUMMARY: This project aims to construct and test an algorithmic trading model and document the end-to-end steps using a template.

INTRODUCTION: This algorithmic trading model compares a simple trend-following strategy with or without using RSI as the exit signal for an individual stock. The model will use a trend window size of ten days for long trades only. When the 14-day RSI value reaches 70, the model will exit the long position.

ANALYSIS: In this modeling iteration, we analyzed the stock of AAPL (Apple Inc.) between January 1, 2016, and July 26, 2021. The mean-reversion model without using RSI produced a profit of 51.55 dollars per share, while the model with RSI signals returned 90.01. In addition, the buy-and-hold approach yielded a gain of 122.61 dollars per share.

CONCLUSION: For the AAPL stock during the modeling time frame, the long-only trading strategy with or without RSI as the exit signal did not produce a better return than the buy-and-hold approach. We should consider experimenting with more variations of the strategy.

Dataset ML Model: Time series analysis with numerical attributes

Dataset Used: Quandl

The HTML formatted report can be found here on GitHub.

]]>SUMMARY: This project aims to construct a predictive model using a TensorFlow convolutional neural network (CNN) and document the end-to-end steps using a template. The Large Scale Fish Images dataset is a multi-class classification situation where we attempt to predict one of several (more than two) possible outcomes.

INTRODUCTION: This dataset contains nine different seafood types collected from a supermarket in Izmir, Turkey, for a university-industry collaboration project at Izmir University of Economics, and this work was published in ASYU 2020. For each class, there are 1000 augmented images and their pair-wise augmented ground truths.

In iteration Take1, we constructed a CNN model based on the InceptionV3 architecture to predict the leaf’s health state based on the available images.

In this Take2 iteration, we will construct a CNN model based on the DenseNet201 architecture to predict the leaf’s health state based on the available images.

ANALYSIS: In iteration Take1, the InceptionV3 model’s performance achieved an accuracy score of 99.65% after ten epochs using the training dataset. The same model processed the validation dataset with an accuracy score of 93.83%.

In this Take2 iteration, the DenseNet201 model’s performance achieved an accuracy score of 99.79% after ten epochs using the training dataset. The same model processed the validation dataset with an accuracy score of 97.56%.

CONCLUSION: In this iteration, the DenseNet201-based CNN model appeared to be suitable for modeling this dataset. We should consider experimenting with TensorFlow for further modeling.

Dataset Used: A Large-Scale Dataset for Fish Segmentation and Classification

Dataset ML Model: Multi-class image classification with numerical attributes

Dataset Reference: Ulucan, Oguzhan and Karakaya, Diclehan and Turkan, Mehmet (2020), “A Large-Scale Dataset for Fish Segmentation and Classification,” 2020 Innovations in Intelligent Systems and Applications Conference (ASYU), IEEE (https://ieeexplore.ieee.org/abstract/document/9259867)

One potential source of performance benchmarks: https://www.kaggle.com/crowww/a-large-scale-fish-dataset

The HTML formatted report can be found here on GitHub.

]]>SUMMARY: This project aims to construct a predictive model using a TensorFlow convolutional neural network (CNN) and document the end-to-end steps using a template. The Large Scale Fish Images dataset is a multi-class classification situation where we attempt to predict one of several (more than two) possible outcomes.

INTRODUCTION: This dataset contains nine different seafood types collected from a supermarket in Izmir, Turkey, for a university-industry collaboration project at Izmir University of Economics, and this work was published in ASYU 2020. For each class, there are 1000 augmented images and their pair-wise augmented ground truths.

In this Take1 iteration, we will construct a CNN model based on the InceptionV3 architecture to predict the leaf’s health state based on the available images.

ANALYSIS: In this Take1 iteration, the InceptionV3 model’s performance achieved an accuracy score of 99.65% after ten epochs using the training dataset. The same model processed the validation dataset with an accuracy score of 93.83%.

CONCLUSION: In this iteration, the InceptionV3-based CNN model appeared to be suitable for modeling this dataset. We should consider experimenting with TensorFlow for further modeling.

Dataset Used: A Large-Scale Dataset for Fish Segmentation and Classification

Dataset ML Model: Multi-class image classification with numerical attributes

Dataset Reference: Ulucan, Oguzhan and Karakaya, Diclehan and Turkan, Mehmet (2020), “A Large-Scale Dataset for Fish Segmentation and Classification,” 2020 Innovations in Intelligent Systems and Applications Conference (ASYU), IEEE (https://ieeexplore.ieee.org/abstract/document/9259867)

One potential source of performance benchmarks: https://www.kaggle.com/crowww/a-large-scale-fish-dataset

The HTML formatted report can be found here on GitHub.

]]>SUMMARY: This project aims to construct and test an algorithmic trading model and document the end-to-end steps using a template.

INTRODUCTION: This algorithmic trading model compares a simple mean-reversion strategy with or without using RSI as the exit signal for an individual stock. The model will use a trend window size of ten days for long trades only. When the 14-day RSI value reaches 70, the model will exit the long position.

ANALYSIS: In this modeling iteration, we analyzed 14 stocks between January 1, 2016, and July 16, 2021. The models’ performance appeared at the end of the script.

CONCLUSION: For all the stocks during the modeling time frame, the long-only trading strategy with or without RSI as the exit signal did not produce a better return than the buy-and-hold approach, except for NFLX. We should consider modeling these stocks further by experimenting with more variations of the strategy.

Dataset ML Model: Time series analysis with numerical attributes

Dataset Used: Quandl

The HTML formatted report can be found here on GitHub.

]]>SUMMARY: This project aims to construct a predictive model using a TensorFlow convolutional neural network (CNN) and document the end-to-end steps using a template. The PlantaeK Jammu Kashmir Leaf dataset is a binary classification situation where we attempt to predict one of the two possible outcomes.

INTRODUCTION: This dataset contains 2153 healthy and unhealthy plant leaf images for eight different fruits and vegetables. The plants taken for study are the native plants of the Kashmir region of India. Eight different plants, namely Apple, Apricot, Cherry, Cranberry, Grapes, Peach, Pear, and Walnut, are selected for the study based on their commercial and medicinal usage. The leaf is the primary object of reference taken for making the database, as they grow much earlier than fruits and the other plant parts.

In iteration Take1, we constructed a CNN model based on the InceptionV3 architecture to predict the leaf’s health state based on the available images.

In iteration Take2, we constructed a CNN model based on the DenseNet architecture to predict the leaf’s health state based on the available images.

In this Take3 iteration, we will construct a CNN model based on the Xception architecture to predict the leaf’s health state based on the available images.

ANALYSIS: In iteration Take1, the InceptionV3 model’s performance achieved an accuracy score of 95.07% after ten epochs using the training dataset. The same model processed the validation dataset with an accuracy score of 79.72%.

In iteration Take2, the DenseNet model’s performance achieved an accuracy score of 92.23% after ten epochs using the training dataset. The same model processed the validation dataset with an accuracy score of 75.06%.

In this Take3 iteration, the Xception model’s performance achieved an accuracy score of 93.45% after ten epochs using the training dataset. The same model processed the validation dataset with an accuracy score of 74.59%.

CONCLUSION: In this iteration, the Xception-based CNN model appeared to be suitable for modeling this dataset. We should consider experimenting with TensorFlow for further modeling.

Dataset Used: PlantaeK: A leaf database of native plants of Jammu and Kashmir.

Dataset ML Model: Multi-class image classification with numerical attributes

Dataset Reference: KOUR, VIPPON PREET; Arora, Sakshi (2019), “PlantaeK: A leaf database of native plants of Jammu and Kashmir,” Mendeley Data, V2, doi: 10.17632/t6j2h22jpx.2 (https://data.mendeley.com/datasets/t6j2h22jpx/2)

One potential source of performance benchmarks: https://data.mendeley.com/datasets/t6j2h22jpx/2

The HTML formatted report can be found here on GitHub.

]]>SUMMARY: This project aims to construct and test an algorithmic trading model and document the end-to-end steps using a template.

INTRODUCTION: This algorithmic trading model compares a simple mean-reversion strategy with or without using RSI as the exit signal for an individual stock. The model will use a trend window size of ten days for long trades only. When the 14-day RSI value reaches 70, the model will exit the long position.

ANALYSIS: In this modeling iteration, we analyzed the stock of AAPL (Apple Inc.) between January 1, 2016, and July 19, 2021. The mean-reversion model without using RSI produced a profit of 3.59 dollars per share, while the model with RSI signals returned 45.97. In addition, the buy-and-hold approach yielded a gain of 118.09 dollars per share.

CONCLUSION: For the AAPL stock during the modeling time frame, the long-only trading strategy with or without RSI as the exit signal did not produce a better return than the buy-and-hold approach. We should consider experimenting with more variations of the strategy.

Dataset ML Model: Time series analysis with numerical attributes

Dataset Used: Quandl

The HTML formatted report can be found here on GitHub.

]]>SUMMARY: This project aims to construct a predictive model using a TensorFlow convolutional neural network (CNN) and document the end-to-end steps using a template. The PlantaeK Jammu Kashmir Leaf dataset is a binary classification situation where we attempt to predict one of the two possible outcomes.

INTRODUCTION: This dataset contains 2153 healthy and unhealthy plant leaf images for eight different fruits and vegetables. The plants taken for study are the native plants of the Kashmir region of India. Eight different plants, namely Apple, Apricot, Cherry, Cranberry, Grapes, Peach, Pear, and Walnut, are selected for the study based on their commercial and medicinal usage. The leaf is the primary object of reference taken for making the database, as they grow much earlier than fruits and the other plant parts.

In iteration Take1, we constructed a CNN model based on the InceptionV3 architecture to predict the leaf’s health state based on the available images.

In this Take2 iteration, we will construct a CNN model based on the DenseNet architecture to predict the leaf’s health state based on the available images.

ANALYSIS: In iteration Take1, the InceptionV3 model’s performance achieved an accuracy score of 95.07% after ten epochs using the training dataset. The same model processed the validation dataset with an accuracy score of 79.72%.

In this Take2 iteration, the DenseNet model’s performance achieved an accuracy score of 92.23% after ten epochs using the training dataset. The same model processed the validation dataset with an accuracy score of 75.06%.

CONCLUSION: In this iteration, the DenseNet-based CNN model appeared to be suitable for modeling this dataset. We should consider experimenting with TensorFlow for further modeling.

Dataset Used: PlantaeK: A leaf database of native plants of Jammu and Kashmir.

Dataset ML Model: Multi-class image classification with numerical attributes

Dataset Reference: KOUR, VIPPON PREET; Arora, Sakshi (2019), “PlantaeK: A leaf database of native plants of Jammu and Kashmir,” Mendeley Data, V2, doi: 10.17632/t6j2h22jpx.2 (https://data.mendeley.com/datasets/t6j2h22jpx/2)

One potential source of performance benchmarks: https://data.mendeley.com/datasets/t6j2h22jpx/2

The HTML formatted report can be found here on GitHub.

]]>SUMMARY: This project aims to construct a predictive model using a TensorFlow convolutional neural network (CNN) and document the end-to-end steps using a template. The PlantaeK Jammu Kashmir Leaf dataset is a binary classification situation where we attempt to predict one of the two possible outcomes.

INTRODUCTION: This dataset contains 2153 healthy and unhealthy plant leaf images for eight different fruits and vegetables. The plants taken for study are the native plants of the Kashmir region of India. Eight different plants, namely Apple, Apricot, Cherry, Cranberry, Grapes, Peach, Pear, and Walnut, are selected for the study based on their commercial and medicinal usage. The leaf is the primary object of reference taken for making the database, as they grow much earlier than fruits and the other plant parts.

In this Take1 iteration, we will construct a CNN model based on the InceptionV3 architecture to predict the leaf’s health state based on the available images.

ANALYSIS: In this Take1 iteration, the InceptionV3 model’s performance achieved an accuracy score of 95.07% after five epochs using the training dataset. The same model processed the validation dataset with an accuracy score of 79.72%.

CONCLUSION: In this iteration, the InceptionV3-based CNN model appeared to be suitable for modeling this dataset. We should consider experimenting with TensorFlow for further modeling.

Dataset Used: PlantaeK: A leaf database of native plants of Jammu and Kashmir.

Dataset ML Model: Multi-class image classification with numerical attributes

Dataset Reference: KOUR, VIPPON PREET; Arora, Sakshi (2019), “PlantaeK: A leaf database of native plants of Jammu and Kashmir,” Mendeley Data, V2, doi: 10.17632/t6j2h22jpx.2 (https://data.mendeley.com/datasets/t6j2h22jpx/2)

One potential source of performance benchmarks: https://data.mendeley.com/datasets/t6j2h22jpx/2

The HTML formatted report can be found here on GitHub.

]]>INTRODUCTION: This algorithmic trading model compares a simple mean-reversion and trend-following strategy for a group of stocks. The model will use a trend window size of ten days for long trades only.

ANALYSIS: In this modeling iteration, we analyzed ten stocks between January 1, 2016, and July 9, 2021. The models’ performance appeared at the end of the script.

CONCLUSION: For all the stocks during the modeling time frame, the long-only trading strategy with either mean-reversion or trend-following approach did not produce a better return than the buy-and-hold approach, except for LUV and PFE. We should consider modeling these stocks further by experimenting with more variations of the strategy.

Dataset ML Model: Time series analysis with numerical attributes

Dataset Used: Quandl

The HTML formatted report can be found here on GitHub.

]]>SUMMARY: This project aims to construct a predictive model using various machine learning algorithms and document the end-to-end steps using a template. The Kaggle Tabular Playground June 2021 dataset is a multi-class modeling situation where we attempt to predict one of several (more than two) possible outcomes.

INTRODUCTION: Kaggle wants to provide an approachable environment for relatively new people in their data science journey. Since January 2021, they have hosted playground-style competitions on Kaggle with fun but less complex, tabular datasets. The dataset used for this competition is synthetic but based on a real dataset and generated using a CTGAN. The original dataset deals with predicting the category on an eCommerce product given various attributes about the listing. Although the features are anonymized, they have properties relating to real-world features.

ANALYSIS: The performance of the cross-validated TensorFlow models achieved an average logarithmic loss benchmark of 1.7669 after running for ten epochs. When we applied the final model to Kaggle’s test dataset, the model achieved a logarithmic loss score of 1.7638.

CONCLUSION: In this iteration, the XGBoost model appeared to be a suitable algorithm for modeling this dataset.

Dataset Used: Kaggle Tabular Playground 2021 June Data Set

Dataset ML Model: Multi-Class classification with numerical and categorical attributes

Dataset Reference: https://www.kaggle.com/c/tabular-playground-series-jun-2021/

One potential source of performance benchmark: https://www.kaggle.com/c/tabular-playground-series-jun-2021/leaderboard

The HTML formatted report can be found here on GitHub.

]]>