How do I properly encode Unicode characters in my

2020-03-30 01:56发布

Problem Statement

I create a number of strings, concatenate them together into CSV format, and then email the string as an attachment.

When these strings contain only ASCII characters, the CSV file is built and emailed properly. When I include non-ASCII characters, the result string becomes malformed and the CSV file is not created properly. (The email view shows an attachment, but it is not sent.)

For instance, this works:

uncle bill's house of pancakes

But this doesn't (note the curly apostrophe):

uncle bill’s house of pancakes

Question

How do I create and encode the final string properly so that all valid unicode characters are included and the result string is formed properly?

Notes

  • The strings are created via a UITextField and then are written to and then read from a Core Data store.

  • This suggests that the problem lies in the initial creation and encoding of the string: NSString unicode encoding problem

  • I don't want to have to do this: remove non ASCII characters from NSString in objective-c

  • The strings are written and read to/from the data store fine. The strings display properly (individually) in the app's table views. The problem only manifests when concatenating the strings together for the email attachment.

String Processing Code

I concatenate my strings together like this:

[reportString appendFormat:@"%@,", category];
[reportString appendFormat:@"%@,", client];
[reportString appendFormat:@"%@\n", detail];
etc.

Replacing curly quotes with boring quotes makes it work, but I don't want to do it this way:

- (NSMutableString *)cleanString:(NSString *)activity {
    NSString *temp1 = [activity stringByReplacingOccurrencesOfString:@"’" withString:@"'"];
    NSString *temp2 = [temp1 stringByReplacingOccurrencesOfString:@"‘" withString:@"'"];
    NSString *temp3 = [temp2 stringByReplacingOccurrencesOfString:@"”" withString:@"\""];
    NSString *temp4 = [temp3 stringByReplacingOccurrencesOfString:@"“" withString:@"\""];
    return [NSMutableString temp4];
}

Edit: The email is sent:

    NSString *attachment = [self formatReportCSV];
    [picker addAttachmentData:[attachment dataUsingEncoding:NSStringEncodingConversionAllowLossy] mimeType:nil fileName:@"MyCSVFile.csv"];

where formatReportCSV is the function that concatenates and returns the csv string.

1条回答
疯言疯语
2楼-- · 2020-03-30 02:04

You seem to be running across a string encoding issue. Without seeing what your Core Data model looks like, I'd assume the issue boils down to the issue reproduced by the code below.

NSString *string1 = @"Uncle bill’s house of pancakes.";
NSString *string2 = @" Appended with some garbage's stuff.";
NSMutableString *mutableString = [NSMutableString stringWithString: string1];
[mutableString appendString: string2];
NSLog(@"We got: %@", mutableString);
// We got: Uncle bill’s house of pancakes. Appended with some garbage's stuff.

NSData *storedVersion = [mutableString dataUsingEncoding: NSStringEncodingConversionAllowLossy];
NSString *restoredString = [[NSString alloc] initWithData: storedVersion encoding: NSStringEncodingConversionAllowLossy];
NSLog(@"Restored string with NSStringEncodingConversionAllowLossy: %@", restoredString);
// Restored string with NSStringEncodingConversionAllowLossy: 

storedVersion = [mutableString dataUsingEncoding: NSUTF8StringEncoding];
restoredString = [[NSString alloc] initWithData: storedVersion encoding: NSUTF8StringEncoding];
NSLog(@"Restored string with UTF8: %@", restoredString);
// Restored string with UTF8: Uncle bill’s house of pancakes. Appended with some garbage's stuff.

Note how the first string (encoded using ASCII) couldn't handle the presence of the non-ASCII character (it can if you use dataUsingEncoding:allowsLossyConversion: with the second parameter being YES).

This code should fix the issue:

NSString *attachment = [self formatReportCSV];
[picker addAttachmentData:[attachment dataUsingEncoding: NSUTF8StringEncoding] mimeType:nil fileName:@"MyCSVFile.csv"];

Note: you may need to use one of the UTF16 string encodings if you need to handle non-UTF8 languages like Japanese.

查看更多
登录 后发表回答