I'm running into a bit of a weird issue. Whenever I create a new text file in my iOS application, I set its encoding to be NSUTF8StringEncoding
. If I edit the file and input any characters with diacritics and save the changes, the diacritics render properly in some applications such as BBEdit, TextMate, cat and vi but not in others such as TextEdit, Quick Look and Pages.
I'm using the following code to save the contents of a UITextView to the plain txt files.
NSError *error;
NSString *dataString = self.textView.text;
BOOL savedChanges = [dataString writeToFile:fullPath atomically:YES encoding:NSUTF8Encoding error:&error];
if (!savedChanges)
{
// Pop up an alert saying something went wrong.
}
The unix file
command reports that the saved file is indeed "UTF-8 Unicode text, with no line terminators"
What's even weirder is if I save the file again without changing the contents of the text, the file will then render properly in Quick Look & TextEdit on my Mac.
Any help would be appreciated.
This is a guess, but could it have something to do with a lack of a byte order mark? For UTF8, the BOM looks like EF BB BF
in hex, and should be the very first thing in the file.
If you save a text file with an UTF BOM, and the com.apple.TextEncoding
xattr is not set, any software that opens it will have to guess at the correct character encoding. Some apps guess UTF-8, some guess Mac OS Roman, and others guess something else.
You can replicate this behavior by saving a file as UTF-8 with no BOM, and then in Terminal give the xattr -d com.apple.TextEncoding filename.txt
command.
To set the xattr, you would call setxattr(). There doesn't seem to be a documented way to set it via a Cocoa API. You could also prefix your data with the UTF-8 BOM.
There's the question of what character encoding should be assumed when the BOM and xattr are missing. Is it a bug if it defaults to Mac OS Roman? Should UTF-8 be the default?
Another alternative if you want to avoid the BOM is to set the com.apple.TextEncoding
extended attribute to UTF-8;134217984
on the file -- See here for more details.
I don't know how you would do that from code, but xattr -w com.apple.TextEncoding 'UTF-8;134217984' filename.txt
will do it at the command line to confirm that it fixes the issue for you.
I experienced the same issue just yesterday! My problem was that files created with by NSString
writeToFile:atomically:encoding:error:
were being read and interpreted by QuickLook and Text Edit perfectly, but files created by writing UTF-8 data straight to a file with NSFileHandle
did not work in QL & TE (the automatic encoding got it wrong).
Peter Hosey pointed me to this question and after some testing it turns out that the NSString
method does not use the BOM, instead it just uses the extended attribute com.apple.TextEncoding
. With some code I found online I created an NSString
category to allow you to set this value easily by passing a file path and a NSStringEncoding
value.
NSString (FileTextEncodingAttribute) on gist.github
An example would be:
[NSString setTextEncodingAttribute:NSUTF8StringEncoding atPath:@"myfile.txt"];
I've tested this and it works perfectly :) Who knew writing a plain text file would take so much research!!