Scrapy getting href out of div

2019-06-07 02:49发布

I started to use Scrapy for a small project and I fail to extract the link. Instead of the url I get only "[]" for each time the class is found. Am I missing something obvious?

sel = Selector(response)
for entry in sel.xpath("//div[@class='recipe-description']"):
    print entry.xpath('href').extract()

Sample from the website:

<div class="recipe-description">
    <a href="http://www.url.com/">
        <h2 class="rows-2"><span>SomeText</span></h2>
    </a>
</div>

标签： python web-scraping scrapy

1条回答

Ridiculous、

2楼-- · 2019-06-07 03:25

your xpath query is wrong

for entry in sel.xpath("//div[@class='recipe-description']"):

in this line you are actually iterating our divs that doesn't have any Href attribute

for making it correct you should select achor elements in div:

for entry in sel.xpath("//div[@class='recipe-description']/a"):
    print entry.xpath('href').extract()

best possible solution is extract href attribute in for loop directly

for href in sel.xpath("//div[@class='recipe-description']/a/@href").extract():
    print href

for simplicity you can also use css selectors

for href in sel.css("div.recipe-description a::attr(href)").extract():
    print href

0人赞添加讨论(0) 举报

Scrapy getting href out of div

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间