I'm attempting to select a node using an XPath query and I don't understand why XML::LibXML doesn't find the node when it has an xmlns atribute. Here's a script to demonstrate the issue:
#!/usr/bin/perl
use XML::LibXML; # 1.70 on libxml2 from libxml2-dev 2.6.16-7sarge1 (don't ask)
use XML::XPath; # 1.13
use strict;
use warnings;
use v5.8.4; # don't ask
my ($xpath, $libxml, $use_namespace) = @ARGV;
my $xml = sprintf(<<'END_XML', ($use_namespace ? 'xmlns="http://www.w3.org/2000/xmlns/"' : q{}));
<?xml version="1.0" encoding="iso-8859-1"?>
<RootElement>
<MyContainer %s>
<MyField>
<Name>ID</Name>
<Value>12345</Value>
</MyField>
<MyField>
<Name>Name</Name>
<Value>Ben</Value>
</MyField>
</MyContainer>
</RootElement>
END_XML
my $xml_parser
= $libxml ? XML::LibXML->load_xml(string => $xml, keep_blanks => 1)
: XML::XPath->new(xml => $xml);
my $nodecount = 0;
foreach my $node ($xml_parser->findnodes($xpath)) {
$nodecount ++;
print "--NODE $nodecount--\n"; #would use say on newer perl
print $node->toString($libxml && 1), "\n";
}
unless ($nodecount) {
print "NO NODES FOUND\n";
}
This script allows you to chose between the XML::LibXML parser and the XML::XPath parser. It also allows you to define an xmlns attribute on the MyContainer element or leave it off depending on the arguments passed.
The xpath expression I'm using is "RootElement/MyContainer". When I run the query using the XML::LibXML parser without the namespace it finds the node with no problem:
benb@enkidu:~$ ROC/ECG/libxml_xpath.pl 'RootElement/MyContainer' libxml
--NODE 1--
<MyContainer>
<MyField>
<Name>ID</Name>
<Value>12345</Value>
</MyField>
<MyField>
<Name>Name</Name>
<Value>Ben</Value>
</MyField>
</MyContainer>
However, when I run it with the namespace in place it finds no nodes:
benb@enkidu:~$ ROC/ECG/libxml_xpath.pl 'RootElement/MyContainer' libxml use_namespace
NO NODES FOUND
Contrast this with the output when using the XMLL::XPath parser:
benb@enkidu:~$ ROC/ECG/libxml_xpath.pl 'RootElement/MyContainer' 0 # no namespace
--NODE 1--
<MyContainer>
<MyField>
<Name>ID</Name>
<Value>12345</Value>
</MyField>
<MyField>
<Name>Name</Name>
<Value>Ben</Value>
</MyField>
</MyContainer>
benb@enkidu:~$ ROC/ECG/libxml_xpath.pl 'RootElement/MyContainer' 0 1 # with namespace
--NODE 1--
<MyContainer xmlns="http://www.w3.org/2000/xmlns/">
<MyField>
<Name>ID</Name>
<Value>12345</Value>
</MyField>
<MyField>
<Name>Name</Name>
<Value>Ben</Value>
</MyField>
</MyContainer>
Which of these parser implementations is doing it "right"? Why does XML::LibXML treat it differently when I use a namespace? What can I do to retrieve the node when the namespace is in place?
Using XML::LibXML 1.69.
Maybe this a XML::LibXML 1.69 thing but the strange part is that I can use the normal XPath and findnodes() and the code below prints the nodes.
But if I change the namespace to something other than "http://www.w3.org/2000/xmlns/", then using XML::LibXML::XPathContext is required to get the same nodes to print.
This is a FAQ. XPath considers any unprefixed name in an expression to belong to "no namespace".
Then, the expression:
selects all
MyContainer
elements that belong to "no namespace" and are children of allRootElement
elements that belong to "no namespace" and are children of the context (current node). However, there are no elements at all in the whole document that belong to "no namespace" -- all elements belong to the default namespace.This explains the result you are getting. XML::LibXML is right.
The common solution is that the API of the hosting language allows a specific prefix to be bound to the namespace by "registering" a namespace. Then one can use an expression like:
where
x
is the prefix with which the namespace has been registered.In the very rare occasions where the hosting language doesn't offer registering namespaces, use the following expression:
@Dmitre is right. You need to take a look at XML::LibXML::XPathContext which will allow you to declare the namespace and then you can use namespace aware XPath statements. I gave an example of using this some time ago on stackoverflow - have a look at Why should I use XPathContext with Perl's XML::LibXML