NSString special characters encoding

2019-08-29 02:32发布

问题:

Im trying to convert some special characters like ä,ö,ü,α,μ,α,ο,ι, and others from a webpage. When I download the page with the ASIHTTPRequest i get some codes instead of the character itself. Examples:
ä = \u00E4
μ = \u03BC
α = \u03B1

This also happens if I use [NSString stringWithContentsOfURL:aNSURL encoding:NSASCIIStringEncoding error:nil]; I have tried different encodings available but none of them work for the above example. For example: With the NSUnicodeStringEncoding I get some strange like 'chinese' characters and with NSASCIIStringEncoding I get these numbers&letters.

The strange thing is, if I look in the source code, in a web browser like safari, of the webpage, it's all fine, with the normal HTML character entity like: ä = ä

Is there any way to convert these encoded letters back?


Thanks

EDIT
Sorry, that I forgot to mention the source code of a browser above.

I just noticed on this site: link that the hex HTML Entity is very similar to what I have got with tis code. Examples:
ä = ä
μ = μ
α = α

As you can maybe see, they are very similar. Just lowercase and the 0's are replaced with one x, and at the beginning add &#, to the end a ;. I will just have to write some small code to convert the numbers&letters to the hex entities, not going to be a big problem. Then just have to use an HTML entity convertor and done.

Anyway, thanks a lot for helping me out again

Sean

回答1:

You can use the found at this link. It uses a built in method from the CFXML parser. It describes the code below

@interface MREntitiesConverter : NSObject {
 NSMutableString* resultString;
}
@property (nonatomic, retain) NSMutableString* resultString;
- (NSString)convertEntiesInString:(NSString)s;
@end

@implementation MREntitiesConverter
@synthesize resultString;
- (id)init
{
 if([super init]) {
 resultString = [[NSMutableString alloc] init];
 }
 return self;
}
- (void)parser:(NSXMLParser *)parser foundCharacters:(NSString *)s {
 [self.resultString appendString:s];
}
- (NSString)convertEntiesInString:(NSString)s {
 if(s == nil) {
 NSLog(@"ERROR : Parameter string is nil");
 }
 NSString* xmlStr = [NSString stringWithFormat:@"<d>%@</d>", s];
 NSData *data = [xmlStr dataUsingEncoding:NSUTF8StringEncoding allowLossyConversion:YES];
 NSXMLParser* xmlParse = [[NSXMLParser alloc] initWithData:data];
 [xmlParse setDelegate:self];
 [xmlParse parse];
 NSString* returnStr = [[NSString alloc] initWithFormat:@"%@",resultString];
 return returnStr;
}
- (void)dealloc {
 [resultString release];
 [super dealloc];
}
@end

Alternatively you can use NSString* sI = (NSString*)CFXMLCreateStringByUnescapingEntities(NULL, (CFStringRef)s, NULL); which is available depending on which OS you are building for.



回答2:

Also you can check this out and use it: https://github.com/mwaterfall/MWFeedParser/blob/master/Classes/NSString+HTML.m

- (NSString *)stringByConvertingHTMLToPlainText;
- (NSString *)stringByDecodingHTMLEntities;
- (NSString *)stringByEncodingHTMLEntities;
- (NSString *)stringWithNewLinesAsBRs;
- (NSString *)stringByRemovingNewLinesAndWhitespace;
- (NSString *)stringByLinkifyingURLs;

Check using this method:

- (NSString *)stringByDecodingHTMLEntities;


回答3:

After having another try with Rob Mayoffs code it worked! Here is the link to his answer:
Converting escaped UTF8 characters back to their original form