Web Scraping of Machine Learning Mastery Blog Entries Using Python and BeautifulSoup

SUMMARY: The purpose of this project is to practice web scraping by extracting specific pieces of information from a website. The web scraping code was written in Python 3 and leverages the BeautifulSoup module.

INTRODUCTION: Dr. Jason Brownlee’s Machine Learning Mastery hosts its tutorial lessons at https://machinelearningmastery.com/blog. The purpose of this exercise is to practice web scraping by gathering the blog entries from Machine Learning Mastery’s web pages. This iteration of the script automatically traverses the web pages to capture all blog entries and store all captured information in a JSON output file.

Starting URLs: https://machinelearningmastery.com/blog

The source code and JSON output can be found here on GitHub.