NSXMLParser stops parsing after encountering speci

2019-05-27 03:13发布

I am reading a XML file from google weather api and parsing it using NSXMLParser. The city in question is Paris. Here is a brief xml output I get

           <?xml version="1.0"?>
    <xml_api_reply version="1">
    <weather module_id="0" tab_id="0" mobile_row="0" mobile_zipped="1" row="0" section="0" ><forecast_information>
    <city data="Paris, Île-de-France"/>
    <postal_code data="Paris"/>
    <latitude_e6 data=""/>
    <longitude_e6 data=""/> 
...
...

Now the code I used to pares this xml is

NSString *address = @"http://www.google.com/ig/api?weather=Paris";
    NSURL *URL = [NSURL URLWithString:address];

NSXMLParser *parser = [[NSXMLParser alloc] initWithContentsOfURL:URL];
    [parser setDelegate:self];
    [parser parse];
...

- (void)parser:(NSXMLParser *)parser didStartElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qualifiedName attributes:(NSDictionary *)attributeDict 
{

    NSLog(@"XML Parser 1 ... elementName ... %@", elementName);

}

This is output that I get for the above xml

XML Parser 1 ... elementName ... xml_api_reply
XML Parser 1 ... elementName ... weather
XML Parser 1 ... elementName ... forecast_information

The problem is that it parses all the tags till it reaches "city data" since there is a non-ascii character in the name Paris, Île-de-France and then it just stops. It doesn't process tags afterwards like postal_code. latitude, longitude etc.

So my question is, is there a way I can remove all non-ascii characters from the returned URL XML string?

4条回答
在下西门庆
2楼-- · 2019-05-27 03:42

Ok. I have solved this problem. This is how I got it to work.

First I do is get the XML from the URL with special characters. Then I strip out all the special characters from the XML string. Then I convert the string to NSdata and then pass that nsdata object to my NSXMLParser. Since it has no more special characters NSXMLParser is happy.

Here's the code for anyone who may run across in future. Big thank you to everyone who contributed to this post!

NSString *address = @"http://www.google.com/ig/api?weather=Paris";
    NSURL *URL = [NSURL URLWithString:address];
    NSError *error;    
    NSString *XML = [NSString stringWithContentsOfURL:URL encoding:NSASCIIStringEncoding error:&error];

    //REMOVE ALL NON-ASCII CHARACTERS
         NSMutableString *asciiCharacters = [NSMutableString string];
         for (NSInteger i = 32; i < 127; i++)  
         {
         [asciiCharacters appendFormat:@"%c", i];
         }

         NSCharacterSet *nonAsciiCharacterSet = [[NSCharacterSet characterSetWithCharactersInString:asciiCharacters] invertedSet];

         XML = [[XML componentsSeparatedByCharactersInSet:nonAsciiCharacterSet] componentsJoinedByString:@""];

    NSData *data = [XML dataUsingEncoding:NSUTF8StringEncoding];
    NSXMLParser *parser = [[NSXMLParser alloc] initWithData:data];
    [parser setDelegate:self];
    [parser parse];

EDIT:

NSXMLParser is a horrible tool. I have successfully used RaptureXML in all my apps. Its super easy to use and avoids all this non-sense of non-ascii characters. https://github.com/ZaBlanc/RaptureXML

查看更多
【Aperson】
3楼-- · 2019-05-27 03:49

The problem you're having is that Google's response uses a different encoding than the ASCII or UTF8 that you're expecting. Using the handy command line tool curl, it's easy to see that:

$ curl -I http://www.google.com/ig/api?weather=Paris
HTTP/1.1 200 OK
X-Frame-Options: SAMEORIGIN
Content-Type: text/xml; charset=ISO-8859-1
...

If you look up ISO-8859-1, you'll find that it's also known as the Latin-1 character set. One of the built-in encoding options is NSISOLatin1StringEncoding, so do this:

NSString *XML = [NSString stringWithContentsOfURL:URL encoding:NSISOLatin1StringEncoding error:&error];

Using the correct encoding will make it possible for NSString to figure out how to interpret the characters, and you'll get back usable data. Alternately, you may be able to modify your request to specify the character encoding that you want Google to provide. That might be preferable, so that you don't have to try to match the encoding you use to a specific request.

Edit: Up to this point, my answer focusses on just getting the response as a readable string. I see that you're real question involves parsing with NSXMLParser, though. I think you have at least two options here:

  • Modify the XML that you receive to include the character encoding. The XML that you get back is Latin-1 encoded, but the XML tag says just: <?xml version="1.0"?>. You could modify that to look like: <?xml version="1.0" encoding="ISO-8859-1"?>. I don't know if that would solve the problem with NSXMLParser, but it might.

  • As suggested above, request the character set that you want from Google. Adding a Accept-Charset header to the request should do the trick, though that'll make retrieving the data a little more complicated.

查看更多
萌系小妹纸
4楼-- · 2019-05-27 03:52

I know what could be happening, i just had the same problem...

Look at your foundCharacters method at your parser...

I had something like this:

if (!currentElementValue) {
   currentElementValue = [[NSMutableString alloc] initWithString:string];
}

and currentElementValue just stopped getting when special chars happend.

now my working code is:

if (!currentElementValue) {
    currentElementValue = [[NSMutableString alloc] initWithString:string];
} else {
    [currentElementValue appendString:string];
}

Remember to set currentElementValue to nil at the end of your didEndElement method

查看更多
走好不送
5楼-- · 2019-05-27 04:06

Stick with ISO-8859-1, so you don't need to "remove special characters". Use a different mechanism for getting the http data.

Use an NSURLConnection, it's far more flexible in the long run and asynchronos.

NSMutableURLRequest *theRequest = [NSMutableURLRequest requestWithURL:[NSURL URLWithString:url]
                                            cachePolicy:NSURLRequestUseProtocolCachePolicy
                                        timeoutInterval:15.0];

 NSURLConnection *theConnection = [[NSURLConnection alloc] initWithRequest:theRequest delegate:self];
    if (theConnection) {
        // Create the NSMutableData to hold the received data.
        // receivedData is an instance variable declared elsewhere.
        receivedData = [[NSMutableData data] init];
        return YES;
    } else {
        // Inform the user that the connection failed.
        return NO;
    }
}

#pragma mark - Url connection data delegate

- (void)connection:(NSURLConnection *)connection didReceiveResponse:(NSURLResponse *)response {
    [receivedData setLength:0];
}


- (void)connection:(NSURLConnection *)connection didReceiveData:(NSData *)data {
    [receivedData appendData:data];
}

- (void)connection:(NSURLConnection *)connection didFailWithError:(NSError *)error {
    receivedData = nil;
    [self badLoad];
}

- (void)connectionDidFinishLoading:(NSURLConnection *)connection {
    //inform delegate of completion
    [self.delegate fetchedData:receivedData];

    receivedData = nil;
}
查看更多
登录 后发表回答