Scrapy getting href out of div

2019-06-07 02:49发布

I started to use Scrapy for a small project and I fail to extract the link. Instead of the url I get only "[]" for each time the class is found. Am I missing something obvious?

sel = Selector(response)
for entry in sel.xpath("//div[@class='recipe-description']"):
    print entry.xpath('href').extract()

Sample from the website:

<div class="recipe-description">
    <a href="http://www.url.com/">
        <h2 class="rows-2"><span>SomeText</span></h2>
    </a>
</div>

1条回答
Ridiculous、
2楼-- · 2019-06-07 03:25

your xpath query is wrong

for entry in sel.xpath("//div[@class='recipe-description']"):

in this line you are actually iterating our divs that doesn't have any Href attribute

for making it correct you should select achor elements in div:

for entry in sel.xpath("//div[@class='recipe-description']/a"):
    print entry.xpath('href').extract()

best possible solution is extract href attribute in for loop directly

for href in sel.xpath("//div[@class='recipe-description']/a/@href").extract():
    print href

for simplicity you can also use css selectors

for href in sel.css("div.recipe-description a::attr(href)").extract():
    print href
查看更多
登录 后发表回答