Im trying to convert some special characters like ä
,ö
,ü
,α
,μ
,α
,ο
,ι
, and others from a webpage. When I download the page with the ASIHTTPRequest i get some codes instead of the character itself. Examples:
ä = \u00E4
μ = \u03BC
α = \u03B1
This also happens if I use [NSString stringWithContentsOfURL:aNSURL encoding:NSASCIIStringEncoding error:nil];
I have tried different encodings available but none of them work for the above example. For example: With the NSUnicodeStringEncoding
I get some strange like 'chinese' characters and with NSASCIIStringEncoding
I get these numbers&letters.
The strange thing is, if I look in the source code, in a web browser like safari, of the webpage, it's all fine, with the normal HTML character entity like: ä = ä
Is there any way to convert these encoded letters back?
Thanks
EDIT
Sorry, that I forgot to mention the source code of a browser above.
I just noticed on this site: link that the hex HTML Entity is very similar to what I have got with tis code. Examples:
ä = ä
μ = μ
α = α
As you can maybe see, they are very similar. Just lowercase and the 0
's are replaced with one x
, and at the beginning add &#
, to the end a ;
.
I will just have to write some small code to convert the numbers&letters to the hex entities, not going to be a big problem. Then just have to use an HTML entity convertor and done.
Anyway, thanks a lot for helping me out again
Sean
You can use the found at this link. It uses a built in method from the CFXML parser. It describes the code below
@interface MREntitiesConverter : NSObject {
NSMutableString* resultString;
}
@property (nonatomic, retain) NSMutableString* resultString;
- (NSString)convertEntiesInString:(NSString)s;
@end
@implementation MREntitiesConverter
@synthesize resultString;
- (id)init
{
if([super init]) {
resultString = [[NSMutableString alloc] init];
}
return self;
}
- (void)parser:(NSXMLParser *)parser foundCharacters:(NSString *)s {
[self.resultString appendString:s];
}
- (NSString)convertEntiesInString:(NSString)s {
if(s == nil) {
NSLog(@"ERROR : Parameter string is nil");
}
NSString* xmlStr = [NSString stringWithFormat:@"<d>%@</d>", s];
NSData *data = [xmlStr dataUsingEncoding:NSUTF8StringEncoding allowLossyConversion:YES];
NSXMLParser* xmlParse = [[NSXMLParser alloc] initWithData:data];
[xmlParse setDelegate:self];
[xmlParse parse];
NSString* returnStr = [[NSString alloc] initWithFormat:@"%@",resultString];
return returnStr;
}
- (void)dealloc {
[resultString release];
[super dealloc];
}
@end
Alternatively you can use NSString* sI = (NSString*)CFXMLCreateStringByUnescapingEntities(NULL, (CFStringRef)s, NULL);
which is available depending on which OS you are building for.
Also you can check this out and use it: https://github.com/mwaterfall/MWFeedParser/blob/master/Classes/NSString+HTML.m
- (NSString *)stringByConvertingHTMLToPlainText;
- (NSString *)stringByDecodingHTMLEntities;
- (NSString *)stringByEncodingHTMLEntities;
- (NSString *)stringWithNewLinesAsBRs;
- (NSString *)stringByRemovingNewLinesAndWhitespace;
- (NSString *)stringByLinkifyingURLs;
Check using this method:
- (NSString *)stringByDecodingHTMLEntities;
After having another try with Rob Mayoffs code it worked! Here is the link to his answer:
Converting escaped UTF8 characters back to their original form