I've got an international character stored in a unichar variable. This character does not come from a file or url. The variable itself only stores an unsigned short(0xce91) which is in UTF-8 format and translates to the greek capital letter 'A'. I'm trying to put that character into an NSString variable but i fail miserably.
I've tried 2 different ways both of which unsuccessful:
unichar greekAlpha = 0xce91; //could have written greekAlpha = 'Α' instead.
NSString *theString = [NSString stringWithFormat:@"Greek Alpha: %C", greekAlpha];
No good. I get some weird chinese characters. As a sidenote this works perfectly with english characters.
Then I also tried this:
NSString *byteString = [[NSString alloc] initWithBytes:&greekAlpha
length:sizeof(unichar)
encoding:NSUTF8StringEncoding];
But this doesn't work either. I'm obviously doing something terribly wrong, but I don't know what. Can someone help me please ? Thanks!
The code above is the moral equivalent of
unichar foo = 'abc';
.The problem is that
'Α'
doesn't map to a single byte in the "execution character set" (I'm assuming UTF-8) which is "implementation-defined" in C99 §6.4.4.4 10:One way is to make
'ab'
equal to'a'<<8|b
. Some Mac/iOS system headers rely on this for things likeOSType
/FourCharCode
/FourCC; the only one in iOS that comes to mind is CoreVideo pixel formats. This is, however, unportable.If you really want a
unichar
literal, you can tryL'A'
(technically it's awchar_t
literal, but on OS X and iOS,wchar_t
is typically UTF-16 so it'll work for things inside the BMP). However, it's far simpler to just use@"Α"
(which works as long as you set the source character encoding correctly) or@"\u0391"
(which has worked since at least the iOS 3 SDK).The above answer is great but doesn't account for UTF-8 characters longer than 16 bits, e.g. the ellipsis symbol - 0xE2,0x80,0xA6. Here's a tweak to the code:
Note the different string initialisation method which doesn't require a length parameter.
Since
0xce91
is in the UTF-8 format and%C
expects it to be in UTF-16 a simple solution like the one above won't work. ForstringWithFormat:@"%C"
to work you need to input0x391
which is the UTF-16 unicode.In order to create a string from the UTF-8 encoded unichar you need to first split the unicode into it's octets and then use
initWithBytes:length:encoding
.And now you can incorporate that NSString into another in any way you like. Do note, however, that it is now legal to type a Greek alpha directly into an NSString literal.
Here is an algorithm for UTF-8 encoding on a single character: