Xerces-C validate xml with hardcoded xsd

2019-02-28 07:26发布

问题:

I'm writing a library which takes xml files and parses them. To prevent users from feeding inalid xmls into my application i'm using xerces to validate the xml files via an xsd.

However, i only manages to validate against xsd-files. Theoretically an user could just open this file and mess around with it. That's why i would like my xsd to be hardcoded in my library.

Unfortunately i haven't found a way to do this with XercesC++, yet.

That's how it is working right now...

bool XmlParser::validateXml(std::string a_XsdFilename)
{
    xercesc::XercesDOMParser  domParser;
    if (domParser.loadGrammar(a_XsdFilename.c_str(), xercesc::Grammar::SchemaGrammarType) == NULL)
    {
        throw Exceptions::Parser::XmlSchemaNotReadableException();
    }

    XercesParserErrorHandler parserErrorHandler;

    domParser.setErrorHandler(&parserErrorHandler);
    domParser.setValidationScheme(xercesc::XercesDOMParser::Val_Always);
    domParser.setDoNamespaces(true);
    domParser.setDoSchema(true);
    domParser.setValidationSchemaFullChecking(true);

    domParser.parse(m_Filename.c_str());

    return (domParser.getErrorCount() == 0);

}

std::string m_Filename is a member variable holding the path of the xml i validate.

std::string a_XsdFilename is the path to the xsd i validate against.

XercesParserErrorHandler inherits from xercesc::ErrorHandler and does error handling.

How can i replace std::string a_XsdFilename with something like std::string a_XsdText? Where std::string a_XsdText contains the schema definition itself instead of a path to a file containing the schema definition.

回答1:

I'll describe three ways of how to hardcode your XSD in your program:

  • by loading the XSD from a file path (this is what your example program does right now)
  • by loading the XSD from a string (this is what you ask for)
  • by loading the XSD from a precompiled binary

Loading the XSD from a file path

Boris Kolpackov suggests in a blog post that applications should provide the XSD schema files by themselves rather than looking up the schema files through the xsi:schemaLocation or xsi:noNamespaceSchemaLocation attributes found in the XML file.

In the blog post there is a link to load-grammar-dom , an example program (put in the public domain) that makes use of the xercesc::DOMLSParser::loadGrammar function:

user@linux:~$ load-grammar-dom
usage: load-grammar-dom [test.xsd ... ] [test.xml ...]
user@linux:~$ 

Loading the XSD from a string

If you would like to pass the XSD file contents as a string, you would need to use another overload of xercesc::DOMLSParser::loadGrammar where you pass

const DOMLSInput *source

instead of

const char *const systemId

The DOMLSInput could be created with the help of xercesc::MemBufInputSource and xercesc::Wrapper4InputSource like this

xercesc::Wrapper4InputSource source(
    new xercesc::MemBufInputSource(
       (const XMLByte *) (a_XsdText.c_str()),
    a_XsdText.size(),
    "A name");

(Adapted somewhat from https://stackoverflow.com/a/15829424/757777 but untested)

Loading the XSD from a precompiled binary

Included in the software CodeSynthesis XSD the embedded example (that is put in the public domain) demonstrates how to use

xercesc::BinInputStream and xercesc::XMLGrammarPool::deserializeGrammars

to load a precompiled XSD schema.

See also README.

The example contains the program xsdbin that compiles XSD schema files into a binary file.

user@linux:~$ xsdbin --help
Usage: xsdbin [options] <files>
Options:
  --help                 Print usage information and exit.
  --verbose              Print progress information.
  --output-dir <dir>     Write generated files to <dir>.
  --hxx-suffix <sfx>     Header file suffix instead of '-schema.hxx'.
  --cxx-suffix <sfx>     Source file suffix instead of '-schema.cxx'.
  --array-name <name>    Binary data array name.
  --disable-multi-import Disable multiple import support.
user@linux:~$

In the makefile the XSD schema file is precompiled by xsdbin and the result ends up inside the example executable.