Check if NSData contains ASCII or UTF8 Encoding

2019-05-31 12:00发布

问题:

I am retrieving HTML, containing UTF8 or ASCII encoded text. For most common use it is ASCII decoding that works to display the text right:

NSString *responseString    =   [[NSString alloc] initWithData:responseData encoding:NSASCIIStringEncoding];

Now I have another HTML page with UTF8 encoding, so I have to use:

NSString *responseString    =   [[NSString alloc] initWithData:responseData encoding:NSUTF8StringEncoding];

What kind of encoding I retrieve is random when loading websites. My question is, is there a way to check the NSData for what kind of decoding is the right to use? So I know which encoding type I need to use.

Thnx!

回答1:

I don't know if it is possible to check the encoding of NSData so this is what I did:

NSString *dataStr;
dataStr = [[NSString alloc] initWithData:data encoding:NSASCIIStringEncoding]; 
if (!dataStr)
{
    NSLog(@"ASCII not working, will try utf-8!");
    dataStr = [[NSString alloc] initWithData:data encoding:NSUTF8StringEncoding];
}
//Do stuff with dataStr


回答2:

Although Heliem answer is useful, it is not a solution if ASCII and UTF8 give both a string in return. For instance: UTF8 gives me some extra characters (negative result) and ASCII are showing the right characters (positive result). I now use the following code:

NSString *responseString, *responseStringASCII, *responseStringUTF8;

responseStringASCII = [[NSString alloc] initWithData:responseData encoding:NSASCIIStringEncoding]; 
if (!responseStringASCII)
{
   // ASCII is not working, will try utf-8!

    responseString = [[NSString alloc] initWithData:responseData encoding:NSUTF8StringEncoding];
}
else
{
    //  ASCII is working, but check if UTF8 gives less characters

    responseStringUTF8  = [[NSString alloc] initWithData:responseData encoding:NSUTF8StringEncoding];

    if(responseStringUTF8 != nil && [responseStringUTF8 length] < [responseStringASCII length])
    {
        responseString  =   [responseStringUTF8 retain];
    }
    else 
    {
        responseString  =   [responseStringASCII retain];
    }

    [responseStringUTF8 release];
}

[responseStringASCII release];