Web Scraping of O’Reilly Strata Data Conference 2019 London Using R

SUMMARY: The purpose of this project is to practice web scraping by extracting specific pieces of information from a website. The web scraping R code leverages the rvest package.

INTRODUCTION: O’Reilly Strata Data Conference is an annual meeting that takes a deep dive into emerging techniques, technologies, and best practice for data science. This web scraping script will automatically traverse through the entire web page and collect all links to the PDF and PPTX documents. The script will also download the documents as part of the scraping process.

Starting URLs: https://conferences.oreilly.com/strata/strata-eu-2019/public/schedule/proceedings

The source code and HTML output can be found here on GitHub.