From the documentation of XML::Simple
:
The use of this module in new code is discouraged. Other modules are available which provide more straightforward and consistent interfaces. In particular, XML::LibXML is highly recommended.
The major problems with this module are the large number of options and the arbitrary ways in which these options interact - often with unexpected results.
Can someone clarify for me what the key reasons for this are?
The real problem is that what
XML::Simple
primarily tries to do is take XML, and represent it as a perl data structure.As you'll no doubt be aware from
perldata
the two key data structures you have available is thehash
and thearray
.And XML doesn't do either really. It has elements which are:
And these things don't map directly to the available perl data structures - at a simplistic level, a nested hash of hashes might fit - but it can't cope with elements with duplicated names. Nor can you differentiate easily between attributes and child nodes.
So
XML::Simple
tries to guess based on the XML content, and takes 'hints' from the various option settings, and then when you try and output the content, it (tries to) apply the same process in reverse.As a result, for anything other than the most simple XML, it becomes unwieldy at best, or loses data at worst.
Consider:
This - when parsed through
XML::Simple
gives you:Note - now you have under
parent
- just anonymous hashes, but underanother_node
you have an array of anonymous hashes.So in order to access the content of
child
:Note how you've got a 'child' node, with a 'content' node beneath it, which isn't because it's ... content.
But to access the content beneath the first
another_child
element:Note how - because of having multiple
<another_node>
elements, the XML has been parsed into an array, where it wasn't with a single one. (If you did have an element calledcontent
beneath it, then you end up with something else yet). You can change this by usingForceArray
but then you end up with a hash of arrays of hashes of arrays of hashes of arrays - although it is at least consistent in it's handling of child elements. Edit: Note, following discussion - this is a bad default, rather than a flaw with XML::Simple.You should set:
If you apply this to the XML as above, you get instead:
This will give you consistency, because you will no longer have single node elements handle differently to multi-node.
But you still:
E.g.:
You still have
content
andchild
hash elements treated as if they were attributes, and because hashes are unordered, you simply cannot reconstruct the input. So basically, you have to parse it, then run it throughDumper
to figure out where you need to look.But with an
xpath
query, you get at that node with:What you don't get in
XML::Simple
that you do inXML::Twig
(and I presumeXML::LibXML
but I know it less well):xpath
support.xpath
is an XML way of expressing a path to a node. So you can 'find' a node in the above withget_xpath('//child')
. You can even use attributes in thexpath
- likeget_xpath('//another_child[@different_att]')
which will select exactly which one you wanted. (You can iterate on matches too).cut
andpaste
to move elements aroundparsefile_inplace
to allow you to modifyXML
with an in place edit.pretty_print
options, to formatXML
.twig_handlers
andpurge
- which allows you to process really big XML without having to load it all in memory.simplify
if you really must make it backwards compatible withXML::Simple
.It's also widely available - easy to download from
CPAN
, and distributed as an installable package on many operating systems. (Sadly it's not a default install. Yet)See: XML::Twig quick reference
For the sake of comparison:
Vs.
XML::Simple is the most complex XML parser available
The main problem with XML::Simple is that the resulting structure is extremely hard to navigate correctly.
$ele->{ele_name}
can return any of the following (even for elements that follow the same spec):This means that you have to perform all kinds of checks to see what you actually got. But the sheer complexity of this encourages developers to make very bad assumptions instead.
The options for making a more regular tree fall short
You can use the following options to create a more regular tree:
But even with these options, many checks are still needed to extract information from a tree. For example, getting the
/root/eles/ele
nodes from a document is a common operation that should be trivial to perform, but the following is required when using XML::Simple:In another parser, one would use the following:
XML::Simple imposes numerous limitations, and it lacks common features
It's completely useless for producing XML. Even with
ForceArray => 1, ForceContent => 1, KeyAttr => [], KeepRoot => 1
, there are far too many details that can't be controlled.It doesn't preserve the relative order of children with different names.
It has limited (with XML::SAX backend) or no (with XML::Parser backend) support for namespaces and namespace prefixes.
It can't handle elements with both text and elements as children (which means it can't handle XHTML, among others).
Some backends (e.g. XML::Parser) are unable to handle encodings not based on ASCII (e.g. UTF-16le).
An element can't have a child element and an attribute with the same name.
It can't create XML documents with comments.
Ignoring the major issues previously mentioned, XML::Simple could still be usable with these limitations. But why go to the trouble of checking if XML::Simple can handle your document format and risk having to switch to another parser later? You could simply use a better parser for all your documents from the start.
Not only do some other parsers not subject you to these limitations, they provide loads of other useful features in addition. The following are a few features they might have that XML::Simple doesn't:
Speed. XML::Simple is extremely slow, especially if you use a backend other than XML::Parser. I'm talking orders of magnitude slower than other parsers.
XPath selectors or similar.
Support for extremely large documents.
Support for pretty printing.
Is XML::Simple ever useful?
The only format for which XML::Simple is simplest is one where no element is optional. I've had experience with countless XML formats, and I've never encountered such a format.
This fragility and complexity alone are reasons enough to warrant staying away from XML::Simple, but there are others.
Alternatives
I use XML::LibXML. It's an extremely fast, full-featured parser. If I ever needed to handle documents that didn't fit into memory, I'd use XML::LibXML::Reader (and its
copyCurrentNode(1)
) or XML::Twig (usingtwig_roots
).I disagree with the docs
I'll dissent and say that
XML::Simple
is just that.. simple. And, it's always been easy and enjoyable for me to use. Test it with the input you're receiving. So long as the input does not change, you're good. The same people that complain about usingXML::Simple
complain about usingJSON::Syck
to serialize Moose. The docs are wrong because they take into account correctness over efficiency. If you only care about the following, you're good:If you're making an abstract parser that isn't defined by application but by spec, I'd use something else. I worked at a company one time and we had to accept 300 different schemas of XML none of which had a spec.
XML::Simple
did the job easily. The other options would have required us to actually hire someone to get the job done. Everyone thinks XML is something that is sent in a rigid all encompassing spec'ed format such that if you write one parser you're good. If that's the case don't useXML::Simple
. XML, before JSON, was just a "dump this and walk" format from one language to another. People actually used things likeXML::Dumper
. No one actually knew what was outputted. Dealing with that scenarioXML::Simple
is greattt! Sane people still dump to JSON without spec to accomplish the same thing. It's just how the world works.Want to read the data in, and not worry about the format? Want to traverse Perl structures and not XML possibilities? Go
XML::Simple
.By extension...
Likewise, for most applications
JSON::Syck
is sufficient to dump this and walk. Though if you're sending to lots of people, I'd highly suggest not being a douche nozzle and making a spec which you export to. But, you know what.. Sometime you're going to get a call from someone you don't want to talk to who wants his data that you don't normally export. And, you're going to pipe it throughJSON::Syck
's voodoo and let them worry about it. If they want XML? Charge them $500 more and fire up ye' oleXML::Dumper
.Take away
It may be less than perfect, but
XML::Simple
is damn efficient. Every hour saved in this arena you can potentially spend in a more useful arena. That's a real world consideration.The other answers
Look XPath has some upsides. Every answer here boils down to preferring XPath over Perl. That's fine. If you would rather use an a standardized XML domain specific language to access your XML, have at it!
Perl doesn't provide for an easy mechanism to access deeply nested optional structures.
Getting the value of
foo
here in these two contexts can be tricky.XML::Simple
knows this and that's why you can force the former.. However, that even withForceArray
, if the element isn't there you'll throw an error..now, if
bar
is optional, You're left accessing it$xml->{bar}[0]{foo}
and@{$xml->{bar}}[0]
will throw an error. Anyway, that's just perl. This has 0 to do withXML::Simple
imho. And, I admitted thatXML::Simple
is not good for building to spec. Show me data, and I can access it with XML::Simple.