03-02- books select elements

01

Find the following elements from the page to add to the script from the following page,
https://books.toscrape.com/

  • title
  • price
  • link - to more info on book

Quick way to get any attribute

books = response.css('article.product_pod')
                    
books[0].css('div.image_container a img::attr(alt)').get()

This will return
'A Light in the Attic'

02

Get the following elements

fetch('https://books.toscrape.com/')
books = response.css('article.product_pod')
book = books[0]
book.css('h3 a::text').get()

We can get the title also with the attrib method( is it called a method??)

book.css('h3 a').attrib['title']
book.css('.product_price .product_color::text').get() 
book.css('h3 a').attrib['href']

03

Add in the above selectors to the parse() method inside the BooksSpider class

(**You must yield an object you can not a single book)

import scrapy
class BookspiderSpider(scrapy.Spider):
name = "bookspider"
allowed_domains = ["books.toscrape.com"]
start_urls = ["https://books.toscrape.com/"]

def parse(self, response):
    books = response.css('article.product_pod')
    for book in books:
    yield{
    'name': book.css('h3 a::text').get(),
    'price': book.css('.product_price .product_color::text').get(),
    'url': book.css('h3 a').attrib['href']
}

04

exit the scrapy shell, change directory into the bookscaper folder and run

    scrapy crawl bookspider

You must be in the same directory that contains the scrapy.cfg file

    cd /bookscraper
                    
.
├── bookscraper
│   ├── __init__.py
│   ├── __pycache__
│   │   ├── __init__.cpython-310.pyc
│   │   └── settings.cpython-310.pyc
│   ├── items.py
│   ├── middlewares.py
│   ├── pipelines.py
│   ├── settings.py
│   └── spiders
│       ├── __init__.py
│       ├── __pycache__
│       │   ├── __init__.cpython-310.pyc
│       │   └── bookspider.cpython-310.pyc
│       └── bookspider.py
└── scrapy.cfg


                

05

You can see the last elements returned from the page in the console