How can I convert Plain Text (.txt) files to a string if the encoding type is unknown?
I'm working on a feature that would allow users to import txt files into my app. This means the file could have been created in any number of apps, utilizing any of a variety of encodings that would be considered valid for a plain text file. My understanding is this could include (ASCII, UTF-8, UTF-16, UTF-16BE, UTF-16LE, UTF-32, UTF-32BE, UTF-32LE, or EBCDIC?!)
Things had been going well using the following:
NSString *txtFileAsString = [NSString stringWithContentsOfFile:path encoding:NSUTF8StringEncoding error:&errorReading];
Then a user supplied a file that resulted in empty content when imported. I watched the file in XCode debug, and see a Cocoa error 261, NSStringEncoding=4.
What I know:
- The user supplied file was created with an app called knowtes
- The file opens with TextEdit, TextWranger, etc. on Mac OS X
- The file contains "special characters" such as umlauts (rant: why doesn't the "u" on umlaut have an umlaut?!)
- Finder Info displays:
Kind: text
- Terminal -I outputs:
text/plain; charset=utf-16le
I am guessing that the utf-16le encoding of the file is the key, as I'm expecting a NSUTF8 file. I attempted to use ASCII as a lowest common denominator. It didn't crash, but fudged in some characters that weren't present in the original file.
NSString *txtFileAsString = [NSString stringWithContentsOfFile:path encoding:NSASCIIStringEncoding error:&errorReading];
So I attempted to convert the file to NSData first, hoping it might negate the need to recognize the encoding. It did not work.
NSData *txtFileData = [NSData dataWithContentsOfFile:path];
NSString *txtFileAsString = [[NSString alloc]initWithData:txtFileData encoding:NSUTF8StringEncoding];
This leads me to a few questions:
- Is there not a universal way to convert Plain Text file contents, regardless of encoding, to a string (i.e. lowest common denominator)? I believe that used to be the purpose
initWithContentsOfFile
, which unfortunately is now deprecated. ASCIStringEncoding didn't work. - Is there anything about converting an NSUTF16 encoded file to a string that I would need to handle differently than if it were NSUTF8?
Assuming the file is in fact URF16LE, why does the following suggestion not work either?
NSString *txtFileAsString = nil; if (path !=nil) { NSData *txtFileData = [NSData dataWithContentsOfFile:path]; NSString *txtFileAsString = [[NSString alloc]initWithData:txtFileData encoding:NSASCIIStringEncoding]; if (!txtFileAsString) { txtFileAsString = [[NSString alloc] initWithData:txtFileData encoding:NSUTF8StringEncoding]; } if (!txtFileAsString) { txtFileAsString = [[NSString alloc] initWithData:txtFileData encoding:NSUTF16StringEncoding]; } if (!txtFileAsString) { txtFileAsString = [[NSString alloc] initWithData:txtFileData encoding:NSUTF16LittleEndianStringEncoding]; } if (!txtFileAsString) { txtFileAsString = [[NSString alloc] initWithData:txtFileData encoding:NSUTF16BigEndianStringEncoding]; } if (!txtFileAsString) { txtFileAsString = [[NSString alloc] initWithData:txtFileData encoding:NSUTF32StringEncoding]; } if (!txtFileAsString) { txtFileAsString = [[NSString alloc] initWithData:txtFileData encoding:NSUTF32LittleEndianStringEncoding]; } if (!txtFileAsString) { txtFileAsString = [[NSString alloc] initWithData:txtFileData encoding:NSUTF32BigEndianStringEncoding]; }}
Sometimes
stringWithContentsOfFile:usedEncoding:error:
can do the job (esp if the file has a Byte Order Mark):Note, this rendition with
usedEncoding
should not be confused with the similarly named method that just has aencoding
parameter.