XPath selector works in XPath Helper console, but

2019-07-28 21:26发布

I'm using scrapy to parse interest rates from Russian Central Bank website

I'm also using Xpath Helper extension in Google Chrome to find a necessary XPath selector. The selector I use in XPath Helper Console below works exactly as I need.

The same query for some reason doesn't work in my spider, even though it navigates to the page.

You can see my Spider code below.

import scrapy
import urllib.parse

class RatesSpider(scrapy.Spider):
   name = 'rates'
   allowed_domains = ['cbr.ru']
   start_urls = ['https://www.cbr.ru/hd_base/zcyc_params/zcyc/?DateTo=01.10.2018']

   def parse(self, response):

    rates = response.xpath('/html/body/div/div/div/div/div/table/tbody/tr[2]/td').extract()

    yield {'Rates': rates
       }

The page doesn't seem to be login blocked, because I can parse other elements on the page.

What can I do to make my code work?

标签： xpath web-scraping scrapy scrapy-spider

1条回答

唯我独甜

2楼-- · 2019-07-28 22:11

Table doesn't contain that tbody node - it's added by browser while rendering page, so just don't use it in XPath (.../table/tbody/tr/... -> .../table//tr/...):

rates = response.xpath('/html/body/div/div/div/div/div/table//tr[2]/td').extract()

or simplified

rates = response.xpath('//td').extract()

0人赞添加讨论(0) 举报

XPath selector works in XPath Helper console, but

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间