I want to extract all the HTML5 data attributes from a tag, just like this jQuery plugin.
For example, given:
<span data-age="50" data-location="London" class="highlight">Joe Bloggs</span>
I want to get a hash like:
{ 'data-age' => '50', 'data-location' => 'London' }
I was originally hoping use a wildcard as part of my CSS selector, e.g.
Nokogiri(html).css('span[@data-*]').size
but it seems that isn't supported.
You can do this with a bit of xpath:
This gets all the attributes of
span
elements that start with 'data-'. (You might want to do this in two steps, first to get all the elements you're interested in, then extract the data attributes from each in turn.Continuing the example (using the
span
in your question):produces:
The Node#css docs mention a way to attach a custom psuedo-selector. This might look like the following for selecting nodes with attributes starting with 'data-':
Option 1: Grab all data elements
If all you need is to list all the page's data elements, here's a one-liner:
Output:
Option 2: Group results by tag
If you want to group your results by tag (perhaps you need to do additional processing on each tag), you can do the following:
Then
tags
is an array containing key-value hash pairs, grouped by tag.Option 3: Behavior like the jQuery datasets plugin
If you'd prefer the plugin-like approach, the following will give you a
dataset
method on every Nokogiri node.Then you can find the dataset for a single element:
Or get the dataset for a group of elements:
Example:
The following is the behavior of the
dataset
method above. Given the following lines in the HTML:The output would be:
Try looping through
element.attributes
while ignoring any attribue that does not start with adata-
.