Can you preserve leading and trailing whitespace i

2020-04-02 09:48发布

问题:

How does one tell the XML parser to honor leading and trailing whitespace?

Dim xml: Set xml = CreateObject("MSXML2.DOMDocument")
xml.async = False
xml.loadxml "<xml>1 2</xml>"
wscript.echo len(xml.documentelement.text)

Above prints out 3.

Dim xml: Set xml = CreateObject("MSXML2.DOMDocument")
xml.async = False
xml.loadxml "<xml> 2</xml>"
wscript.echo len(xml.documentelement.text)

Above prints out 1. (I'd like it to print 2).

Is there something special I can put in the xml document itself to tell the parser to keep leading and trailing whitespace in the document?

CLARIFICATION 1: Is there an attribute that can be specificed ONCE at the beginning of the document to apply to all elements?

CLARIFICATION 2: Because the contents of the entities may have unicode data, but the xml file needs to be plain ascii, all entities are encoded -- meaning CDATA's unfortunately are not available.

回答1:

As I commented, all answers recommending the usage of the xml:space="preserve" are wrong.

The xml:space attribute can only be used to control the treatment of whitespace-only nodes, that is text nodes composed entirely of whitespace characters.

This is not at all the case with the current problem.

In fact, the code provided below correctly obtains a length of 2 for the text node contained in:

<xml> 2</xml>

Here is the VB code that correctly gets the length of the text node (do not forget to add a reference to "Microsoft XML, v 3.0"):

Dim xml As MSXML2.DOMDocument
Private Sub Form_Load()
Set xml = CreateObject("MSXML2.DOMDocument")
xml.async = False
xml.loadxml "<xml> 2</xml>"
Dim n
n = Len(xml.documentelement.selectSingleNode("text()").nodeValue)
wscript.echo Len(n)
End Sub

If you put a breakpoint on the line:

wscript.echo Len(n)

you'll see that when the debugger breaks there, the value of n is 2, as it is required.

Therefore, this code is the solution that was being sought.



回答2:

you could try putting it into a CDATA block:

<xml><![CDATA[ 2]]></xml>


回答3:

As mentioned by Dimitre Novatchev, for XML, whitespace is not deleted at will by the parser. The white space is part if the node's value. Since I do not speak Visual Basic, here is a C program with libxml which prints the length of the first text node. There is absolutely no need to set xml:space.

% ./whitespace "<foo> </foo>"
Length of " " is 1

% ./whitespace "<foo> 2</foo>"
Length of " 2" is 2

% ./whitespace "<foo>1 2</foo>" 
Length of "1 2" is 3

Here is the program:

#include <stdio.h>
#include <string.h>
#include <libxml/parser.h>

int
main(int argc, char **argv)
{
    char           *xml;
    xmlDoc         *doc;
    xmlNode        *first_child, *node;
    if (argc < 2) {
        fprintf(stderr, "Usage: %s XML-string\n", argv[0]);
        return 1;
    }
    xml = argv[1];
    doc = xmlReadMemory(xml, strlen(xml), "my data", NULL, 0);
    first_child = doc->children;
    first_child = first_child->children;        /* Skip the root */
    for (node = first_child; node; node = node->next) {
        if (node->type == XML_TEXT_NODE) {
            fprintf(stdout, "Length of \"%s\" is %i\n", (char *) node->content,
                    strlen((char *) node->content));
        }
    }
    return 0;
}