Convert TXT File of Unknown Encoding to String

2020-04-08 03:12发布

How can I convert Plain Text (.txt) files to a string if the encoding type is unknown?

I'm working on a feature that would allow users to import txt files into my app. This means the file could have been created in any number of apps, utilizing any of a variety of encodings that would be considered valid for a plain text file. My understanding is this could include (ASCII, UTF-8, UTF-16, UTF-16BE, UTF-16LE, UTF-32, UTF-32BE, UTF-32LE, or EBCDIC?!)

Things had been going well using the following:

NSString *txtFileAsString = [NSString stringWithContentsOfFile:path encoding:NSUTF8StringEncoding error:&errorReading];

Then a user supplied a file that resulted in empty content when imported. I watched the file in XCode debug, and see a Cocoa error 261, NSStringEncoding=4.

What I know:

  • The user supplied file was created with an app called knowtes
  • The file opens with TextEdit, TextWranger, etc. on Mac OS X
  • The file contains "special characters" such as umlauts (rant: why doesn't the "u" on umlaut have an umlaut?!)
  • Finder Info displays:

Kind: text

text/plain; charset=utf-16le

I am guessing that the utf-16le encoding of the file is the key, as I'm expecting a NSUTF8 file. I attempted to use ASCII as a lowest common denominator. It didn't crash, but fudged in some characters that weren't present in the original file.

NSString *txtFileAsString = [NSString stringWithContentsOfFile:path encoding:NSASCIIStringEncoding error:&errorReading];

So I attempted to convert the file to NSData first, hoping it might negate the need to recognize the encoding. It did not work.

    NSData *txtFileData = [NSData dataWithContentsOfFile:path];
    NSString *txtFileAsString = [[NSString alloc]initWithData:txtFileData encoding:NSUTF8StringEncoding];

This leads me to a few questions:

  1. Is there not a universal way to convert Plain Text file contents, regardless of encoding, to a string (i.e. lowest common denominator)? I believe that used to be the purpose initWithContentsOfFile , which unfortunately is now deprecated. ASCIStringEncoding didn't work.
  2. Is there anything about converting an NSUTF16 encoded file to a string that I would need to handle differently than if it were NSUTF8?
  3. Assuming the file is in fact URF16LE, why does the following suggestion not work either?

    NSString *txtFileAsString = nil;
    if (path !=nil) {
      NSData *txtFileData = [NSData dataWithContentsOfFile:path];
      NSString *txtFileAsString = [[NSString alloc]initWithData:txtFileData encoding:NSASCIIStringEncoding];
    if (!txtFileAsString) {
      txtFileAsString = [[NSString alloc] initWithData:txtFileData encoding:NSUTF8StringEncoding];
    }
    if (!txtFileAsString) {
      txtFileAsString = [[NSString alloc] initWithData:txtFileData encoding:NSUTF16StringEncoding];
    }
    if (!txtFileAsString) {
      txtFileAsString = [[NSString alloc] initWithData:txtFileData encoding:NSUTF16LittleEndianStringEncoding];
    }
    if (!txtFileAsString) {
      txtFileAsString = [[NSString alloc] initWithData:txtFileData encoding:NSUTF16BigEndianStringEncoding];
    }
    if (!txtFileAsString) {
      txtFileAsString = [[NSString alloc] initWithData:txtFileData encoding:NSUTF32StringEncoding];
    }
    if (!txtFileAsString) {
      txtFileAsString = [[NSString alloc] initWithData:txtFileData encoding:NSUTF32LittleEndianStringEncoding];
    }
    if (!txtFileAsString) {
      txtFileAsString = [[NSString alloc] initWithData:txtFileData encoding:NSUTF32BigEndianStringEncoding];
    }}
    

1条回答
疯言疯语
2楼-- · 2020-04-08 03:43

Sometimes stringWithContentsOfFile:usedEncoding:error: can do the job (esp if the file has a Byte Order Mark):

NSError *error;
NSStringEncoding encoding;
NSString *string = [NSString stringWithContentsOfFile:path usedEncoding:&encoding error:&error];

Note, this rendition with usedEncoding should not be confused with the similarly named method that just has a encoding parameter.

查看更多
登录 后发表回答