Scrapy xpath returns an empty list although tag an

In my parse function, here is the code I have written:

hs = Selector(response)
links = hs.xpath(".//*[@id='requisitionListInterface.listRequisition']")
items = []
for x in links:
        item =  CrawlsiteItem()
        item["title"] = x.xpath('.//*[contains(@title, "View this job           description")]/text()').extract()
        items.append(item)
return items

and title returns an empty list.

I am capturing an xpath with an id tag in the links and then with in the links tag, I want to get list of all the values withthe title that has view this job description.

Please help me fix the error in the code.

标签： xpath scrapy

1条回答

劳资没心，怎么记你

2楼-- · 2019-09-11 07:29

If you cURL the request of the URL you provided with curl "https://cognizant.taleo.net/careersection/indapac_itbpo_ext_career/moresearch.ftl?lang=en" you get back a site way different from the one you see in your browser. Your search results in the following <a> element which does not have any text() attribute to select:

<a id="requisitionListInterface.reqTitleLinkAction" 
    title="View this job description"
    href="#"
    onclick="javascript:setEvent(event);requisition_openRequisitionDescription('requisitionListInterface','actOpenRequisitionDescription',_ftl_api.lstVal('requisitionListInterface', 'requisitionListInterface.listRequisition', 'requisitionListInterface.ID5645', this),_ftl_api.intVal('requisitionListInterface', 'requisitionListInterface.ID5649', this));return ftlUtil_followLink(this);">
</a>

This is because the site loads the site loads the information displayed with an XHR request (you can look up this in Chrome for example) and then the site is updated dynamically with the returned information.

For the information you want to extract you should find this XHR request (it is not hard because this is the only one) and call it from your scraper. Then from the resulting dataset you can extract the required data -- you just have to create a parsing algorithm which goes through this pipe separated format and splits it up into job postings and then extracts the information you need like position, id, date and location.

0人赞添加讨论(0) 举报

Scrapy xpath returns an empty list although tag an

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间