I am new to Objective-C and try to convert a malformed UTF8 encoded NSString to a wellformed one using the example on apples docs.
NSString *theString = @"Lügen"; //should be "ü"
NSString *asciiString = [[NSString alloc] initWithData:asciiData encoding:NSASCIIStringEncoding];
NSLog(@"Original: %@ (length %d)", theString, [theString length]);
NSLog(@"Converted: %@ (length %d)", asciiString, [asciiString length]);
Result:
Original: Lügen (length 6)
Converted: LA1/4gen (length 8)
This here is doing nothing:
NSString* str = [NSString stringWithUTF8String:
[theString cStringUsingEncoding:NSASCIIStringEncoding]];
This here crashes my app
NSString* str = [NSString stringWithUTF8String:
[theString cStringUsingEncoding:NSUTF8StringEncoding]];
Anyone any idea what I am doing wrong?
"Malformed UTF-8 sequence" means a sequence of bytes which are invalid in UTF-8. Your problem is unexpected results after parsing a string with a different encoding than the one used by the original author of the string.
Hexadecimal data
C3 BC
parsed with UTF-8 encoding is characterü
. Instead you used Latin-1 encoding, which results inü
. Then you created a NSString from the Latin-1 parsed string, which means you converted the Latin-1 string to a UTF-16 string (which is the native format of NSString).Representing a given data in different encodings shows up as different chars, but doesn't change the data. Converting to a different encoding does change the data in an attempt to reproduce the same characters. Example: The character
ü
isC3 83 C2 BC
in UTF-8, butC3 BC
in Latin-1. So I converted to the same chars in Latin-1 to get the original data, and then I parsed as UTF-8.