XPath selector works in XPath Helper console, but

2019-07-28 21:26发布

I'm using scrapy to parse interest rates from Russian Central Bank website

I'm also using Xpath Helper extension in Google Chrome to find a necessary XPath selector. The selector I use in XPath Helper Console below works exactly as I need.

Xpath Helper Console

The same query for some reason doesn't work in my spider, even though it navigates to the page.

Spider

You can see my Spider code below.

import scrapy
import urllib.parse

class RatesSpider(scrapy.Spider):
   name = 'rates'
   allowed_domains = ['cbr.ru']
   start_urls = ['https://www.cbr.ru/hd_base/zcyc_params/zcyc/?DateTo=01.10.2018']

   def parse(self, response):

    rates = response.xpath('/html/body/div/div/div/div/div/table/tbody/tr[2]/td').extract()

    yield {'Rates': rates
       }

The page doesn't seem to be login blocked, because I can parse other elements on the page.

What can I do to make my code work?

1条回答
唯我独甜
2楼-- · 2019-07-28 22:11

Table doesn't contain that tbody node - it's added by browser while rendering page, so just don't use it in XPath (.../table/tbody/tr/... -> .../table//tr/...):

rates = response.xpath('/html/body/div/div/div/div/div/table//tr[2]/td').extract()

or simplified

rates = response.xpath('//td').extract()
查看更多
登录 后发表回答