In my parse function, here is the code I have written:
hs = Selector(response)
links = hs.xpath(".//*[@id='requisitionListInterface.listRequisition']")
items = []
for x in links:
item = CrawlsiteItem()
item["title"] = x.xpath('.//*[contains(@title, "View this job description")]/text()').extract()
items.append(item)
return items
and title returns an empty list.
I am capturing an xpath with an id tag in the links and then with in the links tag, I want to get list of all the values withthe title that has view this job description.
Please help me fix the error in the code.
If you cURL the request of the URL you provided with
curl "https://cognizant.taleo.net/careersection/indapac_itbpo_ext_career/moresearch.ftl?lang=en"
you get back a site way different from the one you see in your browser. Your search results in the following<a>
element which does not have anytext()
attribute to select:This is because the site loads the site loads the information displayed with an XHR request (you can look up this in Chrome for example) and then the site is updated dynamically with the returned information.
For the information you want to extract you should find this XHR request (it is not hard because this is the only one) and call it from your scraper. Then from the resulting dataset you can extract the required data -- you just have to create a parsing algorithm which goes through this pipe separated format and splits it up into job postings and then extracts the information you need like position, id, date and location.