How do I parse XML using Nokogiri and split a node

I'm using Nokogiri to parse XML.

doc = Nokogiri::XML("http://www.enhancetv.com.au/tvguide/rss/melbournerss.php")

I wasn't sure how to actually retrieve node values correctly.

I'm after the title, link, and description nodes in particular that sit under the item parent nodes.

<item>
    <title>Toasted TV - TEN - 07:00:00 - 21/12/2011</title>
    <link>http://www.enhancetv.com.au/tvguide/</link>
    <description>Join the team for the latest in gaming, sport, gadgets, pop culture, movies, music and other seriously fun stuff! Featuring a variety of your favourite cartoons.</description>
</item>

What I'd like to do is title.split("-") in such a way that I can convert the date and time strings into a valid DateTime object to use later on down the track.

标签： ruby-on-rails ruby xml xml-parsing nokogiri

3条回答

趁早两清

2楼-- · 2019-06-04 00:52

For the example title string you mentioned:

DateTime.parse(s.split(" - ")[-2..-1].join(" "))

This gets you a DateTime object: Wed, 21 Dec 2011 07:00:00 +0000

But you have to keep an eye on the title variations you might need to deal with. Modify the split a bit to meet your need.

Update: didn't noticed you also want more info on how to parse the document. So here's how:

doc = Nokogiri::XML(open("http://www.enhancetv.com.au/tvguide/rss/melbournerss.php"))
data = doc.xpath("//item").map do |item|
  [
    item.search("title").first.content,
    item.search("link").first.content,
    item.search("description").first.content
  ]
end

This will load all title, link and description for items in the data array. Nokogiri::XML accepts a string as xml document content, so you need to open the url then feed the result to it.

0人赞添加讨论(0) 举报

爷的心禁止访问

3楼-- · 2019-06-04 01:10

def parse_time(text)
   items = text.split("-")
   DateTime.strptime("#{items[-2].strip}#{items[-1].strip}", "%H:%M:%S%d/%m/%Y")
end

content = Net::HTTP.get(URI.parse("http://www.enhancetv.com.au/tvguide/rss/melbournerss.php"))
doc = Nokogiri::XML(content){|config| config.noblanks }

doc.search("//item").map{ |node|
   node.children.inject({}) do |hash, node|
     if node.name == "title"
       #or another name
       hash["created_at"] = parse_time(node.text)
     end

     hash[node.name] =  node.text
     hash
   end
}

0人赞添加讨论(0) 举报

兄弟一词,经得起流年.

4楼-- · 2019-06-04 01:11

Since this is an RSS feed, you may want to consider an RSS parser:

require 'simple-rss'
require 'open-uri'

feed = 'http://www.enhancetv.com.au/tvguide/rss/melbournerss.php'
rss = SimpleRSS.parse open(feed)

rss.items.each do |item|
  puts item.title, item.link, item.description
end

0人赞添加讨论(0) 举报

How do I parse XML using Nokogiri and split a node

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间