I'm getting an HTML file as NSData and need to extract some parts of it. For that I need to convert it to NSString with UTF8 encoding. The thing is that this conversion fails, probably because the NSData contains bytes that are invalid for UTF8. I have tried to get the byte array of the data and go over it, but each time I come across non ASCII character (hebrew letters for example) I get jibrish.
Help will be appreciated.
UPDATE:
To Gordon - the NSData generated like that:
NSData *theData = [NSURLConnection sendSynchronousRequest:theRequest returningResponse:&theResponse error:&theError];
When I say that the conversion fails I mean that
[[NSString alloc] initWithData:temp encoding:NSUTF8StringEncoding]
returns nil
To Ed - Here is my code (I got the Byte array from NSData, found what I need, and constructed another Byte array from that - turned it to NSData and then attempted to convert it to NSString... sounds kinda complicated...)
-(NSString *)UTF8StringFromData:(NSData *)theData{
Byte *arr = [theData bytes];
NSUInteger begin1 = [self findIndexOf:@"<li>" bArr:arr size:[theData length]]+4;
NSUInteger end1 = [self findIndexOf:@"</li></ol>" bArr:arr size:[theData length]];
Byte *arr1 = (Byte *)malloc(sizeof(Byte)*((end1-begin1+1)));
NSLog(@"%d %d",begin1, end1);
int j = 0;
for (int i = begin1; i < end1; i++){
arr1[j] = arr[i];
j++;
}
arr1[j]='\0';
NSData *temp = [NSData dataWithBytes:arr1 length:j];
return [[NSString alloc] initWithData:temp encoding:NSUTF8StringEncoding];
}
have you checked the charset= in the HTTP headers and/or the document itself? The most likely reason for the conversion to fail is because the bytes don't represent a valid UTF-8 string.
To Gordon - the NSData generated like that:
When I say that the conversion fails I mean that
returns nil
To Ed - Here is my code (I got the Byte array from NSData, found what I need, and constructed another Byte array from that - turned it to NSData and then attempted to convert it to NSString... sounds kinda complicated...)
}
I'm not sure if you're aware, you don't really need to copy the array to another array before putting it into the new
NSData
object.As for your particular problem, I would try looking through the data manually using the debugger. Put a breakpoint after you have your array (
arr1
). When you hit it, open up the GDB console and try this:With your code, it should print out the string you're trying to get. (With the code I gave above, it won't stop after the . It'll just keep going).
If the result is not what you expect, then there's something wrong with the data, or perhaps with your
begin1
andend1
boundaries.I know this is an old topic but it came up when I was looking for the solution today. I've solved it now so I'm just posting it for others who might run into this page looking for a solution.
Here's what I do in an asynchronous request:
I first store the text encoding name in connection:didReceiveResponse using
Then later in my connectionDidFinishLoading method I used