XML::LibXML's notion of a text node's pare

2019-05-11 17:55发布

问题:

Something seems odd here.

In the example below, I'm accessing text nodes via an XPath query ( //book/isbn/text() ). The text() is necessary to coerce XML::LibXML into allowing me to use the XML::LibXML::Text methods.

To get to the parent node though, I have to invoke the parentNode method twice to get the true parent node (<book> in this case):

use strict;
use warnings;
use XML::LibXML;

my $xml = XML::LibXML->new->parse_string( << 'MAIN' );
  <library>
    <book>
      <title>Perl Best Practices</title>
      <author>Damian Conway</author>
      <isbn>0596001738</isbn>
      <pages>542</pages>
      <image src="http://www.oreilly.com/catalog/covers/perlbp.s.gif"
             width="145" height="190" />
    </book>
    <book>
      <title>Perl Cookbook, Second Edition</title>
      <author>Tom Christiansen</author>
      <author>Nathan Torkington</author>
      <isbn>0596003137</isbn>
      <pages>964</pages>
      <image src="http://www.oreilly.com/catalog/covers/perlckbk2.s.gif"
             width="145" height="190" />
    </book>
  </library>
MAIN

foreach my $isbn ( $xml->findnodes( '//book/isbn/text()' ) ) {

    # Do something with $isbn->setData()

    my $book = $isbn->parentNode->parentNode;  # My daddy's daddy is my daddy?
    print $book->toString;
}

Output

<book>
      <title>Perl Best Practices</title>
      <author>Damian Conway</author>
      <isbn>0596001738</isbn>
      <pages>542</pages>
      <image src="http://www.oreilly.com/catalog/covers/perlbp.s.gif" width="145" height="190"/>
    </book><book>
      <title>Perl Cookbook, Second Edition</title>
      <author>Tom Christiansen</author>
      <author>Nathan Torkington</author>
      <isbn>0596003137</isbn>
      <pages>964</pages>
      <image src="http://www.oreilly.com/catalog/covers/perlckbk2.s.gif" width="145" height="190"/>
    </book>

So:

  • is my understanding of XML nodes incorrect in assuming that //isbn and //isbn/text() are the same node, or
  • is this a bug in XML::LibXML's parentNode method?

回答1:

Each element in a XML document is a node. If that element contains text (e.g. <isbn>019328373476</isbn>), then it is a child node (of type text, as opposed to element) of that element.

It is not a bug in XML::LibXML's parentNode method.