I want to extract all the HTML5 data attributes from a tag, just like this jQuery plugin.

For example, given:

<span data-age="50" data-location="London" class="highlight">Joe Bloggs</span>

I want to get a hash like:

{ 'data-age' => '50', 'data-location' => 'London' }

I was originally hoping use a wildcard as part of my CSS selector, e.g.

Nokogiri(html).css('span[@data-*]').size

but it seems that isn't supported.

标签： ruby xml html5 nokogiri

4条回答

相关推荐>>

2楼-- · 2019-06-23 15:34

You can do this with a bit of xpath:

doc = Nokogiri.HTML(html)
data_attrs = doc.xpath "//span/@*[starts-with(name(), 'data-')]"

This gets all the attributes of span elements that start with 'data-'. (You might want to do this in two steps, first to get all the elements you're interested in, then extract the data attributes from each in turn.

Continuing the example (using the span in your question):

hash = data_attrs.each_with_object({}) do |n, hsh|
  hsh[n.name] = n.value
end

puts hash

produces:

{"data-age"=>"50", "data-location"=>"London"}

0人赞添加讨论(0) 举报

再贱就再见

3楼-- · 2019-06-23 15:36

The Node#css docs mention a way to attach a custom psuedo-selector. This might look like the following for selecting nodes with attributes starting with 'data-':

Nokogiri(html).css('span:regex_attrs("^data-.*")', Class.new {
  def regex_attrs node_set, regex
    node_set.find_all { |node| node.attributes.keys.any? {|k| k =~ /#{regex}/ } }
  end
}.new)

0人赞添加讨论(0) 举报

欢心

4楼-- · 2019-06-23 15:39

Option 1: Grab all data elements

If all you need is to list all the page's data elements, here's a one-liner:

Hash[doc.xpath("//span/@*[starts-with(name(), 'data-')]").map{|e| [e.name,e.value]}]

Output:

{"data-age"=>"50", "data-location"=>"London"}

Option 2: Group results by tag

If you want to group your results by tag (perhaps you need to do additional processing on each tag), you can do the following:

tags = []
datasets = "@*[starts-with(name(), 'data-')]"

#If you want any element, replace "span" with "*"
doc.xpath("//span[#{datasets}]").each do |tag|
    tags << Hash[tag.xpath(datasets).map{|a| [a.name,a.value]}]
end

Then tags is an array containing key-value hash pairs, grouped by tag.

Option 3: Behavior like the jQuery datasets plugin

If you'd prefer the plugin-like approach, the following will give you a dataset method on every Nokogiri node.

module Nokogiri
  module XML
    class Node
      def dataset
        Hash[self.xpath("@*[starts-with(name(), 'data-')]").map{|a| [a.name,a.value]}]
      end
    end
  end
end

Then you can find the dataset for a single element:

doc.at_css("span").dataset

Or get the dataset for a group of elements:

doc.css("span").map(&:dataset)

Example:

The following is the behavior of the dataset method above. Given the following lines in the HTML:

<span data-age="50" data-location="London" class="highlight">Joe Bloggs</span>
<span data-age="40" data-location="Oxford" class="highlight">Jim Foggs</span>

The output would be:

[
 {"data-location"=>"London", "data-age"=>"50"},
 {"data-location"=>"Oxford", "data-age"=>"40"}
]

0人赞添加讨论(0) 举报

爷的心禁止访问

5楼-- · 2019-06-23 15:48

Try looping through element.attributes while ignoring any attribue that does not start with a data-.

0人赞添加讨论(0) 举报

Extracting HTML5 data attributes from a tag

Option 1: Grab all data elements

Option 2: Group results by tag

Option 3: Behavior like the jQuery datasets plugin

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间