I'd like to remove all NULL characters from the string. I know that the right regex match should be \x00 and I've tried the following XQuery:
replace($message, '\x00', '')
It results in the error:
exerr:ERROR Conversion from XPath2 to Java regular expression syntax failed: Error at character 1 in regular expression \x00: invalid escape sequence
Is there any quick solution or workaround for this issue? I use eXist-db 2.2.
Basically, the answer is that there cannot be any NUL (x00) characters in the string. XML, and therefore the XDM data model, does not allow them. So if they appear in your input, you're already outside the scope of the standards.
The short version: you can't, at least not within the boundaries of the XQuery and XML specifications. There may be an eXist-DB-proprietary method I am not aware of (something like nativly interfacing the Java regular expression functions from XQuery, which seems to be possible in eXist DB), but I would not consider this a "quick solution or workaround".
Looking through the XPath and XQuery Functions and Operators 3.0 specification which also contains the definition of regular expressions for XQuery 3.0, there is no specified way of escaping characters by their unicode code point. The
\x00
syntax is specific to some regular expression implementations. regular-expressions.info verifies this assumption:Considering this, there might be two options:
Using XML entities to denote the null byte. This is also not possible, as the XML specification does not allow control characters by definition in Extensible Markup Language (XML) 1.0 (Fifth Edition):
With the additional restriction of allowed characters in the same specification:
XML 1.1 extends this definition to control characters -- containing all of them but the null byte:
Finally, XQuery relies on the same specification considering allowed characters:
Directly including the null byte in the XQuery document. Apart from issues in practice (including null bytes in files will often result in unexpected issues of various kinds), the same limitations to characters as defined above apply (well-formed XML documents must only consist of characters as defined above):
There is an extended discussion of this in Why are “control” characters illegal in XML 1.0?