How does one tell the XML parser to honor leading and trailing whitespace?
Dim xml: Set xml = CreateObject("MSXML2.DOMDocument")
xml.async = False
xml.loadxml "<xml>1 2</xml>"
wscript.echo len(xml.documentelement.text)
Above prints out 3.
Dim xml: Set xml = CreateObject("MSXML2.DOMDocument")
xml.async = False
xml.loadxml "<xml> 2</xml>"
wscript.echo len(xml.documentelement.text)
Above prints out 1. (I'd like it to print 2).
Is there something special I can put in the xml document itself to tell the parser to keep leading and trailing whitespace in the document?
CLARIFICATION 1: Is there an attribute that can be specificed ONCE at the beginning of the document to apply to all elements?
CLARIFICATION 2: Because the contents of the entities may have unicode data, but the xml file needs to be plain ascii, all entities are encoded -- meaning CDATA's unfortunately are not available.
As I commented, all answers recommending the usage of the xml:space="preserve"
are wrong.
The xml:space
attribute can only be used to control the treatment of whitespace-only nodes, that is text nodes composed entirely of whitespace characters.
This is not at all the case with the current problem.
In fact, the code provided below correctly obtains a length of 2 for the text node contained in:
<xml> 2</xml>
Here is the VB code that correctly gets the length of the text node (do not forget to add a reference to "Microsoft XML, v 3.0"):
Dim xml As MSXML2.DOMDocument
Private Sub Form_Load()
Set xml = CreateObject("MSXML2.DOMDocument")
xml.async = False
xml.loadxml "<xml> 2</xml>"
Dim n
n = Len(xml.documentelement.selectSingleNode("text()").nodeValue)
wscript.echo Len(n)
End Sub
If you put a breakpoint on the line:
wscript.echo Len(n)
you'll see that when the debugger breaks there, the value of n
is 2, as it is required.
Therefore, this code is the solution that was being sought.
you could try putting it into a CDATA block:
<xml><![CDATA[ 2]]></xml>
As mentioned by Dimitre Novatchev, for XML, whitespace is not deleted
at will by the parser. The white space is part if the node's
value. Since I do not speak Visual Basic, here is a C program with
libxml which prints the length of the first text node. There is
absolutely no need to set xml:space.
% ./whitespace "<foo> </foo>"
Length of " " is 1
% ./whitespace "<foo> 2</foo>"
Length of " 2" is 2
% ./whitespace "<foo>1 2</foo>"
Length of "1 2" is 3
Here is the program:
#include <stdio.h>
#include <string.h>
#include <libxml/parser.h>
int
main(int argc, char **argv)
{
char *xml;
xmlDoc *doc;
xmlNode *first_child, *node;
if (argc < 2) {
fprintf(stderr, "Usage: %s XML-string\n", argv[0]);
return 1;
}
xml = argv[1];
doc = xmlReadMemory(xml, strlen(xml), "my data", NULL, 0);
first_child = doc->children;
first_child = first_child->children; /* Skip the root */
for (node = first_child; node; node = node->next) {
if (node->type == XML_TEXT_NODE) {
fprintf(stdout, "Length of \"%s\" is %i\n", (char *) node->content,
strlen((char *) node->content));
}
}
return 0;
}