NSString unicode encoding problem

2019-03-04 14:32发布

问题:

I'm having problems converting the string to something readable . I'm using

NSString *substring = [NSString stringWithUTF8String:[symbol.data cStringUsingEncoding:NSUTF8StringEncoding]];

but I can't convert \U7ab6\U51b1 into '

It shows as 窶冱 which is what I don't want, it should show as an '. Can anyone help me?

回答1:

it is shown as a ’

That's character U+2019 RIGHT SINGLE QUOTATION MARK.

What has happened is you've had the character sequence ’s submitted to you, in the UTF-8 encoding, which comes out as bytes:

’          s
E2 80 99   73

That byte sequence has then, incorrectly, been interpreted as if it were encoded in Windows code page 932 (Japanese; more or less Shift-JIS):

E2 80    99 73
窶        冱

So in this one particular case, you could recover the ’s string by firstly encoding the characters into cp932 bytes, and then decoding those bytes back to characters using UTF-8.

However, this will not solve your real problem, which is that the strings were read in incorrectly in the first place. You got 窶冱 in this case because the UTF-8 byte sequence resulting from encoding ’s happened also to be a valid Shift-JIS byte sequence. But that won't be the case for all possible UTF-8 byte sequences you might get. Many other characters will be unrecoverably mangled.

You need to find where bytes are being read into the system and decoded as Shift-JIS, and fix that to use UTF-8 instead.