Web Scraping of SAS Global Forum 2020 Proceedings Using BeautifulSoup

SUMMARY: The purpose of this project is to practice web scraping by extracting specific pieces of information from a website. The web scraping python code leverages the BeautifulSoup module.

INTRODUCTION: The SAS Global Forum covers the full range of topics in using SAS products and developing SAS solutions. This web scraping script will automatically traverse through the entire web page and collect all links to the PDF and PPTX documents. The script will also download the documents as part of the scraping process. The Python script ran in the Google Colaboratory environment and can be adapted to run in any Python environment without the Colab-specific configuration.

Starting URLs: https://www.sas.com/en_us/events/sas-global-forum/program/proceedings.html

The source code and HTML output can be found here on GitHub.