Web Scraping of Quotes from Famous People using Python Take 1

SUMMARY: The purpose of this project is to practice web scraping by gathering specific pieces of information from a website. The web scraping code was written in Python 3 and leveraged the Scrapy framework maintained by Scrapinghub.

INTRODUCTION: A demo website, created by Scrapinghub, lists quotes from famous people. It has many endpoints showing the quotes in different ways, and each endpoint presents a different scraping challenge for practicing web scraping. For this Take1 iteration, the Python script attempts to follow the page links and scrape the quote information off each page.

Starting URLs: http://quotes.toscrape.com/

import scrapy
class QuotesSpider(scrapy.Spider):
    name = "quotes"
    start_urls = ['http://quotes.toscrape.com/']
    def parse(self, response):
        self.log('I just visited: ' + response.url)
        for quote in response.css('div.quote'):
            item = {
                'author_name': quote.css('small.author::text').extract_first(),
                'text': quote.css('span.text::text').extract_first(),
                'tags': quote.css('div.tags a.tag::text').extract(),
                'author_url': quote.css('div.quote > span > a::attr(href)').extract_first(),
            yield item
        # follow pagination link
        next_page_url = response.css('li.next > a::attr(href)').extract_first()
        if next_page_url:
            next_page_url = response.urljoin(next_page_url)
            yield scrapy.Request(url=next_page_url, callback=self.parse)

The source code and JSON output can be found here on GitHub.