i try get html page with UTF-8 charset
NSString *html=[NSString stringWithContentsOfURL:[NSURL URLWithString: @"http://forums.drom.ru/general/t1151288178.html"] encoding:NSUTF8StringEncoding error:&error]);
but NSLog(@"%@",html)
return null
Why is this happening?
The problem is that while the file's meta tag purports to be UTF8, it's not (at least not entirely). You can confirm this by:
Download the html (as
NSData
, which succeeds):Run
iconv
from the Terminal command line, it will report an error (including line number and character number):Thanks to Torsten Marek for sharing that with us.
When I look at that portion of the HTML, there are definitely not UTF8 characters there, buried in the setting of the
clever_cut_pattern
JavaScript variable.If we thought you just got the encoding wrong, the typical counsel in these cases would generally be to use the rendition of
stringWithContentOfURL
with theusedEncoding
parameter (i.e. rather than guessing what the encoding is, letNSString
determine this for you):Unfortunately, in this case, even that fails (presumably because the file purports to be UTF8, but isn't).
The question then becomes "ok, so what do I do now". It depends upon why you were trying to download that HTML in your app, anyway. If you really need to convert this to UTF8 (i.e. strip out the non-UTF8 characters), you could theoretically get the GNU
iconv(3)
function, which is part of thelibiconv
library. That could identify non-conforming characters that you could presumably remove. It's a question of how much work you're willing to go through to handle this non-conforming web page.