Some sites that I am fetching data from are returning UTF-8 strings, with the UTF-8 characters escaped, ie: \u5404\u500b\u90fd
Is there a built in cocoa function that might assist with this or will I have to write my own decoding algorithm.
Some sites that I am fetching data from are returning UTF-8 strings, with the UTF-8 characters escaped, ie: \u5404\u500b\u90fd
Is there a built in cocoa function that might assist with this or will I have to write my own decoding algorithm.
There is no built-in function to do C unescaping.
You can cheat a little with
NSPropertyListSerialization
since an "old text style" plist supports C escaping via\Uxxxx
:but mind that this isn't very efficient. It's far better if you write up your own parser. (BTW are you decoding JSON strings? If yes you could use the existing JSON parsers.)
simple code:
from: https://stackoverflow.com/a/7861345
It's correct that Cocoa does not offer a solution, yet Core Foundation does:CFStringTransform
.CFStringTransform
lives in a dusty, remote corner of Mac OS (and iOS) and so it's a little know gem. It is the front end to Apple's ICU compatible string transformation engine. It can perform real magic like transliterations between greek and latin (or about any known scripts), but it can also be used to do mundane tasks like unescaping strings from a crappy server:As I said,
CFStringTransform
is really powerful. It supports a number of predefined transforms, like case mappings, normalizations or unicode character name conversion. You can even design your own transformations.I have no idea why Apple does not make it available from Cocoa.Edit 2015:
OS X 10.11 and iOS 9 add the following method to Foundation:
So the example from above becomes...
Thanks @nschmidt for the heads up.
Here's what I ended up writing. Hopefully this will help some people along.