-->

Is there a way to “auto detect” the encoding of a

2020-03-26 04:50发布

问题:

Is there a way to "auto detect" the encoding of a resource when loading it using stringFromContentsOfURL? The current (non-depracated) method, + (id)stringWithContentsOfURL:(NSURL *)url encoding:(NSStringEncoding)enc error:(NSError **)error;, wants a URL encoding. I've noticed that getting it wrong does make a difference for what I want to do. Is there a way to check this somehow and always get it right? (Right now I'm using UTF8.)

回答1:

I'd try this function from the docs

Returns a string created by reading data from a given URL and returns by reference the encoding used to interpret the data.

+ (id)stringWithContentsOfURL:(NSURL *)url usedEncoding:(NSStringEncoding *)enc error:(NSError **)error

this seems to guess the encoding and then returns it to you



回答2:

What I normally do when converting data (encoding-less string of bytes) to a string is attempt to initialize the string using various different encodings. I would suggest trying the most limiting (charset wise) encodings like ASCII and UTF-8 first, then attempt UTF-16. If none of those are a valid encoding, you should attempt to decode the string using a fallback encoding like NSWindowsCP1252StringEncoding that will almost always work. In order to do this you need to download the page's contents using NSData so that you don't have to re-download for every encoding attempt. Your code might look like this:

NSData * urlData = [NSData dataWithContentsOfURL:aURL];
NSString * theString = [[NSString alloc] initWithData:urlData encoding:NSASCIIStringEncoding];
if (!theString) {
    theString = [[NSString alloc] initWithData:urlData encoding:NSUTF8StringEncoding];
}
if (!theString) {
    theString = [[NSString alloc] initWithData:urlData encoding:NSUTF16StringEncoding];
}
if (!theString) {
    theString = [[NSString alloc] initWithData:urlData NSWindowsCP1252StringEncoding];
}
// ...
// use theString here...
// ...
[theString release];