Scraping ajax page with Scrapy?

2019-06-07 04:42发布

I'm using Scrapy for scrape data from this page

https://www.bricoetloisirs.ch/magasins/gardena

Product list appears dynamically. Find url to get products

https://www.bricoetloisirs.ch/coop/ajax/nextPage/(cpgnum=1&layout=7.01-14_180_69_164_182&uiarea=2&carea=%24ROOT&fwrd=frwd0&cpgsize=12)/.do?page=2&_=1473841539272

But when i scrape it by Scrapy it give me empty page

<span class="pageSizeInformation" id="page0" data-page="0" data-pagesize="12">Page: 0 / Size: 12</span>

Here is my code

# -*- coding: utf-8 -*-
import scrapy

from v4.items import Product


class GardenaCoopBricoLoisirsSpider(scrapy.Spider):
    name = "Gardena_Coop_Brico_Loisirs_py"

    start_urls = [
            'https://www.bricoetloisirs.ch/coop/ajax/nextPage/(cpgnum=1&layout=7.01-14_180_69_164_182&uiarea=2&carea=%24ROOT&fwrd=frwd0&cpgsize=12)/.do?page=2&_=1473841539272'
        ]

    def parse(self, response):
        print response.body

3条回答
淡お忘
2楼-- · 2019-06-07 05:16

I solve this.

# -*- coding: utf-8 -*-
import scrapy

from v4.items import Product


class GardenaCoopBricoLoisirsSpider(scrapy.Spider):
    name = "Gardena_Coop_Brico_Loisirs_py"

    start_urls = [
            'https://www.bricoetloisirs.ch/magasins/gardena'
        ]

    def parse(self, response):
        for page in xrange(1, 50):
            url = response.url + '/.do?page=%s&_=1473841539272' % page
            yield scrapy.Request(url, callback=self.parse_page)

    def parse_page(self, response):
        print response.body
查看更多
ら.Afraid
3楼-- · 2019-06-07 05:19

I believe you need to send an additional request just like a browser does. Try to modify your code as follows:

# -*- coding: utf-8 -*-
import scrapy

from scrapy.http import Request
from v4.items import Product


class GardenaCoopBricoLoisirsSpider(scrapy.Spider):
    name = "Gardena_Coop_Brico_Loisirs_py"

    start_urls = [
        'https://www.bricoetloisirs.ch/coop/ajax/nextPage/'
    ]

    def parse(self, response):
        request_body = '(cpgnum=1&layout=7.01-14_180_69_164_182&uiarea=2&carea=%24ROOT&fwrd=frwd0&cpgsize=12)/.do?page=2&_=1473841539272'
        yield Request(url=response.url, body=request_body, callback=self.parse_page)

    def parse_page(self, response):
        print response.body
查看更多
戒情不戒烟
4楼-- · 2019-06-07 05:27

As far as i know websites use JavaScript to make Ajax calls.
when you use scrapy the page's JS dose not load.

You will need to take a look at Selenium for scraping those kind of pages.

Or find out what ajax calls are being made and send them yourself.
check this Can scrapy be used to scrape dynamic content from websites that are using AJAX? may help you as well

查看更多
登录 后发表回答