Web Scraping of O’Reilly Software Architecture Conference New York 2019

SUMMARY: The purpose of this project is to practice web scraping by extracting specific pieces of information from a website. The web scraping python code leverages the BeautifulSoup module.

INTRODUCTION: On occasions we have a need to download a batch of documents off a single web page without clicking on the download link one at a time. This web scraping script will automatically traverse through the entire web page and collect all links to the PDF and PPTX documents. The script will also download the PDF and PPTX documents as part of the scraping process.

Starting URLs: https://conferences.oreilly.com/software-architecture/sa-ny-2019/public/schedule/proceedings

The source code and HTML output can be found here on GitHub.