Perl, LibXML and Schemas

2019-04-14 11:25发布

问题:

I have an example Perl script which I am trying to load and validate a file against a schema, them interrogate various nodes.

#!/usr/bin/env perl
use strict;
use warnings;
use XML::LibXML;

my $filename = 'source.xml';
my $xml_schema = XML::LibXML::Schema->new(location=>'library.xsd');
my $parser = XML::LibXML->new ();
my $doc = $parser->parse_file ($filename);

eval {
    $xml_schema->validate ($doc);
};

if ($@) {
    print "File failed validation: $@" if $@;
}

eval {
    print "Here\n";
    foreach my $book ($doc->findnodes('/library/book')) {
        my $title = $book->findnodes('./title');
        print $title->to_literal(), "\n";

    }
};

if ($@) {
    print "Problem parsing data : $@\n";
}

Unfortunately, although it is validating the XML file fine, it is not finding any $book items and therefore not printing out anything.

If I remove the schema from the XML file and the validation from the PL file then it works fine.

I am using the default namespace. If I change it to not use the default namespace (xmlns:lib="http://libs.domain.com" and prefix all items in the XML file with lib and change the XPath expressions to include the namespace prefix (/lib:library/lib:book) then it again works file.

Why? and what am I missing?

XML:

<?xml version="1.0" encoding="utf-8"?>
<library xmlns="http://lib.domain.com" 
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
         xsi:schemaLocation="http://lib.domain.com .\library.xsd">
    <book>
        <title>Perl Best Practices</title>
        <author>Damian Conway</author>
        <isbn>0596001738</isbn>
        <pages>542</pages>
        <image src="http://www.oreilly.com/catalog/covers/perlbp.s.gif" width="145" height="190"/>
    </book>
    <book>
        <title>Perl Cookbook, Second Edition</title>
        <author>Tom Christiansen</author>
        <author>Nathan Torkington</author>
        <isbn>0596003137</isbn>
        <pages>964</pages>
        <image src="http://www.oreilly.com/catalog/covers/perlckbk2.s.gif" width="145" height="190"/>
    </book>
    <book>
        <title>Guitar for Dummies</title>
        <author>Mark Phillips</author>
        <author>John Chappell</author>
        <isbn>076455106X</isbn>
        <pages>392</pages>
        <image src="http://media.wiley.com/product_data/coverImage/6X/07645510/076455106X.jpg" width="100" height="125"/>
    </book>
</library>

XSD:

<?xml version="1.0" encoding="utf-8"?>
<xs:schema xmlns="http://lib.domain.com" xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified" targetNamespace="http://lib.domain.com">
    <xs:attributeGroup name="imagegroup">
        <xs:attribute name="src" type="xs:string"/>
        <xs:attribute name="width" type="xs:integer"/>
        <xs:attribute name="height" type="xs:integer"/>
    </xs:attributeGroup>
    <xs:element name="library">
        <xs:complexType>
            <xs:sequence>
                <xs:element maxOccurs="unbounded" name="book">
                    <xs:complexType>
                        <xs:sequence>
                            <xs:element name="title" type="xs:string"/>
                            <xs:element maxOccurs="unbounded" name="author" type="xs:string"/>
                            <xs:element name="isbn" type="xs:string"/>
                            <xs:element name="pages" type="xs:integer"/>
                            <xs:element name="image">
                                <xs:complexType>
                                    <xs:attributeGroup ref="imagegroup"/>
                                </xs:complexType>
                            </xs:element>
                        </xs:sequence>
                    </xs:complexType>
                </xs:element>
            </xs:sequence>
        </xs:complexType>
    </xs:element>
</xs:schema>

回答1:

From the XML::LibXML docs:

A common mistake about XPath is to assume that node tests consisting of an element name with no prefix match elements in the default namespace. This assumption is wrong - by XPath specification, such node tests can only match elements that are in no (i.e. null) namespace. ...(and later)... ...The recommended way is to use the XML::LibXML::XPathContext module

So, from the perspective of XPath, there is no "default" namespace...for any non-null namespace, you have to specify it in your XPath. The XML::LibXML::XPathContext module lets you create a prefix for any namespace to use in your XPath expression.