XPath to Parse “SRC” from IMG tag?

2019-01-22 02:58发布

Right now I successfully grabbed the full element from an HTML page with this:

//img[@class='photo-large']

for example it would return this:

<img src="http://example.com/img.jpg" class='photo-large' />

But I only need the SRC url (http://example.com/img.jpg). Any help?

标签： html parsing xpath screen-scraping

3条回答

三岁会撩人

2楼-- · 2019-01-22 03:07

Using Hpricot this works:

doc.at('//img[@class="photo-large"]')['src']

In case you have more than one image, the following gives an array:

doc.search('//img[@class="photo-large"]').map do |e| e['src'] end

However, Nokogiri is many times faster and it “can be used as a drop in replacement” for Hpricot.
Here the version for Nokogiri, in which this XPath for selecting attributes works:

doc.at('//img[@class="photo-large"]/@src').to_s

or for many images:

doc.search('//img[@class="photo-large"]/@src').to_a

0人赞添加讨论(0) 举报

走好不送

3楼-- · 2019-01-22 03:23

//img/@src

you can just go with this if you want a link of the image.

example:

<img alt="" class="avatar width-full rounded-2" height="230" src="https://avatars3.githubusercontent.com/...;s=460" width="230">

0人赞添加讨论(0) 举报

成全新的幸福

4楼-- · 2019-01-22 03:27

You are so close to answering this yourself that I am somewhat reluctant to answer it for you. However, the following XPath should provide what you want (provided the source is XHTML, of course).

//img[@class='photo-large']/@src

For further tips, check out W3 Schools. They have excellent tutorials on such things and a great reference too.

0人赞添加讨论(0) 举报

XPath to Parse “SRC” from IMG tag?

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间