SUMMARY: This project aims to practice web scraping by extracting specific pieces of information from a website. The web scraping Python code leverages the BeautifulSoup module.
INTRODUCTION: Haodoo is a website that houses classic Chinese literature for its readers’ enjoyment. Haodoo in Chinese can be translated to “Good Reads” in English. It collects hard-to-find Chinese text/books and makes them available for online reading. The Haodoo collection includes over 3,500 titles of text and audiobooks.
In the previous Take1 iteration, we scraped the website and obtained all the book titles and their assigned categories. In this Take2 iteration, we will use the information collected from Take1 and obtain the links for each book and file format.
Starting URL: https://haodoo.org
The source code and HTML output can be found here on GitHub.