Empty List From Scrapy When Using Xpath to Extract

Really need the help from this community.

My question is that when I used the code in python

response.xpath("//div[contains(@class,'check-prices-widget-not-sponsored')]/a/div[contains(@class,'check-prices-widget-not-sponsored-link')]").extract()

to extract the vendor name in scrapy shell, the output is empty. I really did not know why that happened, and it seems to me that the problem might be the website info is updating dynamically?

The url for this web scrapping is: https://cruiseline.com/cruise/7-night-bahamas-florida-new-york-roundtrip-32860, and what I need is the Vendor name and Price for each vendor. Besides the attached pic is the screenshot of "the inspect". enter image description here

However, the similar code works to extract price in the following page url ('https://cruiseline.com/destination/caribbean/cruise/best?sort=rank,ship_status&&direction=desc&page=1&per_page=10&sailing_counts=0')

Prices = response.xpath(
        "//div[contains(@class,'featured-cruise-price-inner-price')]/span/descendant::text()").extract()

Really appreciate the help!

标签： python xpath web-scraping scrapy

1条回答

地球回转人心会变

2楼-- · 2019-07-24 17:25

I tried this url in scrapy shell:https://cruiseline.com/cruise/7-night-bahamas-florida-new-york-roundtrip-32860, and i also got nothing with

response.xpath("//div[contains(@class,'check-prices-widget-not-sponsored')]/a/div[contains(@class,'check-prices-widget-not-sponsored-link')]").extract()

Then I used view(response) command to figure out what the spider sees, and found out that the site is dynamic, which means if you want to scrape info on that website, you need to execute the js codes that show the info.

Here are the screenshots:

As you can see, the info you need doesn't show. However, this one https://cruiseline.com/destination/caribbean/cruise/best?sort=rank,ship_status&&direction=desc&page=1&per_page=10&sailing_counts=0 is static, so that's why you can scrape what you need.

I got two ways for you to scrape dynamic website(of course, there are more):

1.Splash(Official Doc): In your Spider, yield your url with SplashRequest instead of scrapy.Request.

2.Selenium + PhantomJS(Official Doc)

Good luck with your scraping! :)

0人赞添加讨论(0) 举报

Empty List From Scrapy When Using Xpath to Extract

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间