On my site I created two simple pages: Here are their first html script:
test1.html :
<head>
<title>test1</title>
</head>
<body>
<a href="test2.html" onclick="javascript:return xt_click(this, "C", "1", "Product", "N");" indepth="true">
<span>cool</span></a>
</body></html>
test2.html :
<head>
<title>test2</title>
</head>
<body></body></html>
I want scraping text in the title tag of the two pages.here is "test1" and "test2". but I am a novice with scrapy I only happens scraping only the first page. my scrapy script:
from scrapy.spider import Spider
from scrapy.selector import Selector
from testscrapy1.items import Website
class DmozSpider(Spider):
name = "bill"
allowed_domains = ["http://exemple.com"]
start_urls = [
"http://www.exemple.com/test1.html"
]
def parse(self, response):
sel = Selector(response)
sites = sel.xpath('//head')
items = []
for site in sites:
item = Website()
item['title'] = site.xpath('//title/text()').extract()
items.append(item)
return items
How to pass the onclik? and how to successfully scraping the text of the title tag of the second page? Thank you in advance STEF
To use multiple functions in your code, send multiple requests and parse them, you're going to need: 1) yield instead of return, 2) callback.
Example:
You cannot parse javascript with scrapy, but you can understand what the javascript does and do the same: http://doc.scrapy.org/en/latest/topics/firebug.html