Is there any conceivable reason why I would see different results using unicode string literals versus the actual hex value for the UChar.
UnicodeString s1(0x0040); // @ sign
UnicodeString s2("\u0040");
s1 isn't equivalent to s2. Why?
Is there any conceivable reason why I would see different results using unicode string literals versus the actual hex value for the UChar.
UnicodeString s1(0x0040); // @ sign
UnicodeString s2("\u0040");
s1 isn't equivalent to s2. Why?
The \u escape sequence AFAIK is implementation defined, so it's hard to say why they are not equivalent without knowing details on your particular compiler. That said, it's simply not a safe way of doing things.
UnicodeString has a constructor taking a UChar and one for UChar32. I'd be explicit when using them:
UnicodeString s(static_cast<UChar>(0x0040));
UnicodeString also provide an unescape() method that's fairly handy:
UnicodeString s = UNICODE_STRING_SIMPLE("\\u4ECA\\u65E5\\u306F").unescape(); // 今日は
couldn't reproduce on ICU 4.8.1.1
#include <stdio.h>
#include "unicode/unistr.h"
int main(int argc, const char *argv[]) {
UnicodeString s1(0x0040); // @ sign
UnicodeString s2("\u0040");
printf("s1==s2: %s\n", (s1==s2)?"T":"F");
// printf("s1.equals s2: %d\n", s1.equals(s2));
printf("s1.length: %d s2.length: %d\n", s1.length(), s2.length());
printf("s1.charAt(0)=U+%04X s2.charAt(0)=U+%04X\n", s1.charAt(0), s2.charAt(0));
return 0;
}
=>
s1==s2: T
s1.length: 1 s2.length: 1
s1.charAt(0)=U+0040 s2.charAt(0)=U+0040
gcc 4.4.5 RHEL 6.1 x86_64
For anyone else who find's this, here's what I found (in ICU's documentation).
The compiler's and the runtime character set's codepage encodings are not specified by the C/C++ language standards and are usually not a Unicode encoding form. They typically depend on the settings of the individual system, process, or thread. Therefore, it is not possible to instantiate a Unicode character or string variable directly with C/C++ character or string literals. The only safe way is to use numeric values. It is not an issue for User Interface (UI) strings that are translated.
[1] http://userguide.icu-project.org/strings
The double quotes in your \u
constant are the problem. This evaluated properly:
wchar_t m1( 0x0040 );
wchar_t m2( '\u0040' );
bool equal = ( m1 == m2 );
equal
was true
.