I'm attempting to select a node using an XPath query and I don't understand why XML::LibXML doesn't find the node when it has an xmlns atribute. Here's a script to demonstrate the issue:
#!/usr/bin/perl
use XML::LibXML; # 1.70 on libxml2 from libxml2-dev 2.6.16-7sarge1 (don't ask)
use XML::XPath; # 1.13
use strict;
use warnings;
use v5.8.4; # don't ask
my ($xpath, $libxml, $use_namespace) = @ARGV;
my $xml = sprintf(<<'END_XML', ($use_namespace ? 'xmlns="http://www.w3.org/2000/xmlns/"' : q{}));
<?xml version="1.0" encoding="iso-8859-1"?>
<RootElement>
<MyContainer %s>
<MyField>
<Name>ID</Name>
<Value>12345</Value>
</MyField>
<MyField>
<Name>Name</Name>
<Value>Ben</Value>
</MyField>
</MyContainer>
</RootElement>
END_XML
my $xml_parser
= $libxml ? XML::LibXML->load_xml(string => $xml, keep_blanks => 1)
: XML::XPath->new(xml => $xml);
my $nodecount = 0;
foreach my $node ($xml_parser->findnodes($xpath)) {
$nodecount ++;
print "--NODE $nodecount--\n"; #would use say on newer perl
print $node->toString($libxml && 1), "\n";
}
unless ($nodecount) {
print "NO NODES FOUND\n";
}
This script allows you to chose between the XML::LibXML parser and the XML::XPath parser. It also allows you to define an xmlns attribute on the MyContainer element or leave it off depending on the arguments passed.
The xpath expression I'm using is "RootElement/MyContainer". When I run the query using the XML::LibXML parser without the namespace it finds the node with no problem:
benb@enkidu:~$ ROC/ECG/libxml_xpath.pl 'RootElement/MyContainer' libxml
--NODE 1--
<MyContainer>
<MyField>
<Name>ID</Name>
<Value>12345</Value>
</MyField>
<MyField>
<Name>Name</Name>
<Value>Ben</Value>
</MyField>
</MyContainer>
However, when I run it with the namespace in place it finds no nodes:
benb@enkidu:~$ ROC/ECG/libxml_xpath.pl 'RootElement/MyContainer' libxml use_namespace
NO NODES FOUND
Contrast this with the output when using the XMLL::XPath parser:
benb@enkidu:~$ ROC/ECG/libxml_xpath.pl 'RootElement/MyContainer' 0 # no namespace
--NODE 1--
<MyContainer>
<MyField>
<Name>ID</Name>
<Value>12345</Value>
</MyField>
<MyField>
<Name>Name</Name>
<Value>Ben</Value>
</MyField>
</MyContainer>
benb@enkidu:~$ ROC/ECG/libxml_xpath.pl 'RootElement/MyContainer' 0 1 # with namespace
--NODE 1--
<MyContainer xmlns="http://www.w3.org/2000/xmlns/">
<MyField>
<Name>ID</Name>
<Value>12345</Value>
</MyField>
<MyField>
<Name>Name</Name>
<Value>Ben</Value>
</MyField>
</MyContainer>
Which of these parser implementations is doing it "right"? Why does XML::LibXML treat it differently when I use a namespace? What can I do to retrieve the node when the namespace is in place?