As I work on practicing and solving machine learning (ML) problems, I find myself repeating a set of steps and activities repeatedly.
Thanks to Dr. Jason Brownlee’s suggestions on creating a machine learning template, I have pulled together a set of project templates that can be used to support modeling ML problems using Python.
Version 13 of the templates streamlined the data preparation steps and tried to make the workflow more intuitive and logical. The updated workflow includes the following data preparation stages:
- Stage #1: Eliminate obvious missing values and data errors. Convert data from one type (numerical, nominal, ordinal, etc.) to another as necessary. Make the data represent the feature more succinctly rather than simply accepting the default format.
- Stage #2: Perform one-hot-encoding on the categorical attributes.
- Stage #3: Split the data into training and testing sets
- Stage #4: Scale the numeric attributes and make it easier for the machine learning algorithms to process.
- Stage #5: Balance the class target variable if necessary.
- Stage #6: Perform feature selection techniques.
You will find the Python templates on the Machine Learning Project Templates page.