How to use scrapy to crawl multiple pages? (two le

On my site I created two simple pages: Here are their first html script:

test1.html :

<head>
<title>test1</title>
</head>
<body>
<a href="test2.html" onclick="javascript:return xt_click(this, "C", "1", "Product", "N");" indepth="true">
<span>cool</span></a>
</body></html>

test2.html :

<head>
<title>test2</title>
</head>
<body></body></html>

I want scraping text in the title tag of the two pages.here is "test1" and "test2". but I am a novice with scrapy I only happens scraping only the first page. my scrapy script:

from scrapy.spider import Spider
from scrapy.selector import Selector

from testscrapy1.items import Website

class DmozSpider(Spider):
name = "bill"
allowed_domains = ["http://exemple.com"]
start_urls = [
    "http://www.exemple.com/test1.html"
]


def parse(self, response):

    sel = Selector(response)
    sites = sel.xpath('//head')
    items = []

    for site in sites:
        item = Website()

        item['title'] = site.xpath('//title/text()').extract()

        items.append(item)

    return items

How to pass the onclik? and how to successfully scraping the text of the title tag of the second page? Thank you in advance STEF

标签： scrapy

1条回答

甜甜的少女心

2楼-- · 2019-09-02 02:18

To use multiple functions in your code, send multiple requests and parse them, you're going to need: 1) yield instead of return, 2) callback.

Example:

def parse(self,response):
    for site in response.xpath('//head'):
        item = Website()
        item['title'] = site.xpath('//title/text()').extract()
        yield item
    yield scrapy.Request(url="http://www.domain.com", callback=self.other_function)

def other_function(self,response):
    for other_thing in response.xpath('//this_xpath')
        item = Website()
        item['title'] = other_thing.xpath('//this/and/that').extract()
        yield item

You cannot parse javascript with scrapy, but you can understand what the javascript does and do the same: http://doc.scrapy.org/en/latest/topics/firebug.html

0人赞添加讨论(0) 举报

How to use scrapy to crawl multiple pages? (two le

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间