Web Scraping of AWS Open Data Registry Using Selenium

SUMMARY: The purpose of this project is to practice web scraping by extracting specific pieces of information from a website. The Python web scraping code leverages the Selenium module.

INTRODUCTION: The Registry of Open Data on AWS makes datasets publicly available through AWS services. When data is shared on AWS, anyone can analyze it and build services on top of it using a broad range of compute and data analytics products. Sharing data in the cloud also lets data users spend more time on data analysis rather than data acquisition. The script automatically traverses the dataset listing and capture the descriptive data by storing them in a CSV output file.

Starting URLs: https://registry.opendata.aws/

The source code and HTML output can be found here on GitHub.