This should be easy but I'm stuck.
<div class="paginationControl">
<a href="/en/overview/0-All_manufactures/0-All_models.html?page=2&powerunit=2">Link Text 2</a> |
<a href="/en/overview/0-All_manufactures/0-All_models.html?page=3&powerunit=2">Link Text 3</a> |
<a href="/en/overview/0-All_manufactures/0-All_models.html?page=4&powerunit=2">Link Text 4</a> |
<a href="/en/overview/0-All_manufactures/0-All_models.html?page=5&powerunit=2">Link Text 5</a> |
<!-- Next page link -->
<a href="/en/overview/0-All_manufactures/0-All_models.html?page=2&powerunit=2">Link Text Next ></a>
</div>
I'm trying to use Scrapy (Basespider) to select a link based on it's Link text using:
nextPage = HtmlXPathSelector(response).select("//div[@class='paginationControl']/a/@href").re("(.+)*?Next")
For example, I want to select the next page link based on the fact that it's text is "Link Text Next". Any ideas?
Your xpath is selecting the href not the text in the
a
tag. It doesn't look from your example like the href hasnext
in it, so you can't find it with an RE.Use
a[contains(text(),'Link Text Next')]
:Reference: Documentation on the XPath contains function
PS. Your text
Link Text Next
has a space at the end. To avoid having to include that space in the code:I think using
contains
is a bit more general while still being specific enough.You can use the following XPath expression:
This selects the
href
attributes of the link with text"Link Text Next"
.See XPath string functions if you need more control.