SUMMARY: The purpose of this project is to practice web scraping by extracting specific pieces of information from a website. The Python web scraping code leverages the BeautifulSoup module.
INTRODUCTION: Data.gov is a government data repository website managed and hosted by the U.S. General Services Administration. The purpose of this exercise is to practice web scraping by gathering the dataset entries from Data.gov’s web pages. This iteration of the script automatically traverses the web pages to capture all dataset entries and store all captured information in a JSON output file.
Starting URLs: https://catalog.data.gov/dataset
The source code and HTML output can be found here on GitHub.