I tried using ActiveResource to parse a web service that was more like a HTML document and I kept getting a 404 error.
Do I need to use an XML parser for this task instead of ActiveResource?
My guess is that ActiveResource is only useful if you are consuming data from another Rails app and the XML data is easily translatable to a Rails model. For example, if the web service is more wide-ranging XML like a HTML document or an RSS feed, you want to use a parser like hpricot or nokogiri. Is this correct?
How do you know when to use an XML parser and when to use ActiveResource?
Update: ActiveResource is also not an XML parser. It is a REST consumer allowing you to interact with a remote resource similar to how you would an ActiveRecord model. It does use an XML parser under the hood (I'm assuming through ActiveSupport's XmlMini I show below).
ActiveResource has some strict requirements about the structure of the XML content and works best when interacting with the REST API of another Rails application. It is not intended to do generic screen scraping of an HTML page. For that use Nokogiri directly.
ActiveSupport isn't an XML parser, it is a miscellaneous collection of useful Ruby methods and classes. However, it does offer a wrapper around many different XML parsers giving you a consistent interface.
You can see which XML parser is being used and switch to a different XML parser. Try this in script/console
.
ActiveSupport::XmlMini.backend # => ActiveSupport::XmlMini_REXML
ActiveSupport::XmlMini.backend = 'Nokogiri'
ActiveSupport::XmlMini.backend # => ActiveSupport::XmlMini_Nokogiri
# it will now use Nokogiri
However, that will still use the XML parser in Nokogiri which assumes strict, valid markup. Most HTML pages do not fit this strict requirement and therefore it is better to use Nokogiri's HTML parser directly instead of going through ActiveSupport.
doc = Nokogiri::HTML(...)
I wrote XmlMini because I wanted to answer that same question. XmlMini doesn't really do much, and that lets it stay focused. But if you have any problem that YAML or JSON isn't qualified to handle, XmlMini isn't going to do the job either.
For example, if you've got any need to validate the structure of the XML you're dealing with, XmlMini isn't the tool. Validating by hand is awful.
Similarly, if you're dealing with data that reuses standard element and attribute semantics from somewhere else, like including snippets of UBL, OpenDoc or Atom, you really should get some better tools for namespaces.
ryanb mentions Nokogiri, and I can't think of anything more wonderful for these things. It's got all the power of libxml, with more elegance than almost any library in Ruby. I don't just mean for XML parsing, it's up there with _why's best projects.
But there are some things that even Nokogiri isn't designed for. If you really, absolutely, positively need to kill every angle bracket in the room at break neck speed, you've got to bust out SAX. But if you need speed that badly, don't do it in Ruby. Do it in expat or libxml with pure C. Or don't do it at all.