I have a document which look like this:
<div id="block">
<a href="http://google.com">link</a>
</div>
I can't get Nokogiri to get me the value of href
attribute. I'd like to store the address in a Ruby variable as a string.
I have a document which look like this:
<div id="block">
<a href="http://google.com">link</a>
</div>
I can't get Nokogiri to get me the value of href
attribute. I'd like to store the address in a Ruby variable as a string.
Having struggled with this question in various forms, I decided to write myself a tutorial disguised as an answer. It may be helpful to others.
Starting with with this snippet:
extracting all the links
We can use xpath or css to find all the elements and then keep only the ones that have an
href
attribute:But there's a better way: in the above cases, the
.compact
is necessary because the searches return the "just a bookmark" element as well. We can use a more refined search to find just the elements that contain anhref
attribute:finding a specific link
To find a link within the
<div id="block2">
If you know you're searching for just one link, you can use
at_xpath
orat_css
instead:find a link from associated text
What if you know the text associated with a link and want to find its url? A little xpath-fu (or css-fu) comes in handy:
find text from a link
And what if you want to find the text associated with a particular link? Not a problem:
useful references
In addition to the extensive Nokorigi documentation, I came across some useful links while writing this up:
Or if you wanna be more specific about the div:
where
document
is the Nokogiri HTML parsed.The variable
href
is assigned to the value of the"href"
attribute for the<a>
element inside the element with id'block'
. The linedoc.css('#block a')
returns a single item array containing the attributes of#block a
.[0]
targets that single element, which is a hash containing all the attribute names and values.["href"]
targets the key of"href"
inside that hash and returns the value, which is a string containing the url.Here is my Try for above sample of HTML code: