I'm attempting to decode text which is prefixing certain 'special characters' with \x. I've worked out the following mappings by hand:
\x28 (
\x29 )
\x3a :
e.g. 12\x3a39\x3a03 AM
Does anyone recognise what this encoding is?
I'm attempting to decode text which is prefixing certain 'special characters' with \x. I've worked out the following mappings by hand:
\x28 (
\x29 )
\x3a :
e.g. 12\x3a39\x3a03 AM
Does anyone recognise what this encoding is?
It's ASCII. All occurrences of the four characters
\xST
are converted to 1 character, whose ASCII code isST
(in hexadecimal), whereS
andT
are any of0123456789abcdefABCDEF
.I'm guessing that what you are dealing with is a unicode string that has been encoded differently than the output stream it was sent to. ie. a utf-16 string output to a latin-1 device. In that situation, certain characters will be outputted as escape values to avoid sending control characters or wrong characters to the output device. This happens in python at least.
The
'\xAB'
notation is used in C, C++, Perl, and other languages taking a cue from C, as a way of expressing hexadecimal character codes in the middle of a string.The notation
'\007'
means use octal for the character code, when there are digits after the backslash.In C99 and later, you can also use
\uabcd
and\U00abcdef
to encode Unicode characters in hexadecimal (with 4 and 8 hex digits required; the first two hex digits in\U
must be0
to be valid, and often the third digit will be0
too —1
is the only other valid value).Note that in C, octal escapes are limited to a maximum of 3 digits but hexadecimal escapes are not limited to 2 or 3 digits; the hexadecimal escape ends at the first character that's not a hexadecimal digit. In the question, the sequence is
"12\x3a39\x3a03"
. That is a string containing 4 characters:1
,2
,\x3a39
and\x3a03
. The actual value used for the 4-digit hex characters is implementation-defined. To achieve the desired result (using\x3A
to represent a colon:
), the code would have to use string concatenation:This now contains 8 characters:
1
,2
,:
,3
,9
,:
,0
,3
.