iOS版:结合SAX和DOM解析(iOS: Combining SAX and DOM parsin

2019-09-20 12:53发布

I am currently working on an iPad project for which I need to process large XML file into an SQLite backend. I currently have this working using the TBXML parser.

So all the logic is in place and in general the TBXML parser does the job it needs to do. Only problem I'm now encountering is that the XML files are getting too big and I am running out of memory. Because of this I thinking of switching to a SAX parser like the core NSXMLParser of something like Alan Quatermain's AQXMLParser. However this will require me to redo all of my current logic that to some extent relies on functions provided by a DOM tree. This is something I'd rather not do.

So what I want to try and do is create a hybrid approach. Given my XML structure this should be possible. It's basically a number of repeating "Record" elements. And within each record are various elements that can be repeating and nested. In my current approach I parse the document and pass each record element to a function that processes it into the database. Given that this already exists I want to use this in my hybrid parsing approach.

This is what I want to achieve. Using a SAX parser I traverse my document. While traversing the document I build a Record element. Whenever I complete a record element I pass it along to the existing function that uses TBXML to process it. The SAX parser then continues to build the next record element. Key goals are to: - Fix the memory footprint (it doesn't need to the smallest it can be, but it has to be contstant or at least smaller that using TBXML) - Keep performance acceptable.

Currently want to implement this as follows:

- (void)parser:(NSXMLParser *)parser didStartElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qualifiedName attributes:(NSDictionary *)attributeDict{
    //Recreate record string each time record element is encountered
    if([elementName isEqualToString:@"Record"]) record = [[NSMutableString alloc] init];
    //Write XML tag with name
    [record appendFormat@"<%@>, elementName];
}

- (void)parser:(NSXMLParser *)parser foundCharacters:(NSString *)string{
    //Write XML content
    [record appendString:string];
}

- (void)parser:(NSXMLParser *)parser didEndElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName{
    //Write XML tag
    [record appendFormat@"</%@>, elementName];
    if([elementName isEqualToString:@"Record"]){
        //Parse record string into TBXML object
        TBXML * tbxmlRecord = [TBXML tbxmlWithXMLString:record];
        //Send it to the TBXML record processor
        [self processElement:tbxmlRecord.rootXMLElement];
    }
}

I think this should work but it feels dirty to use a string. Furthermore I have my concerns on if the record string won't get overwritten too soon when the parser reaches a new record element.

So my question is, if this is a sound way to approach this or if there are better ways for me to achieve what I'm looking for?

Edit: I've implemented this approach and it looks to working quite well. Only hiccup I've encountered is that if my source file isn't UTF-8 encoded I only get a partial result. But when I correct this all goes well. Memory usage isn't that much better though. But maybe it takes what it can. Need to run more tests...

Answer 1:

一般来说你的方法听起来不错给我。 如果您的解决方案是为你工作没有性能问题,那么我就不会太担心的字符串处理。 如果你愿意,你可以分析你的应用程序,看看有多少CPU时间被浪费掉。

如果你想稍微做一些更优化,你可以尝试找到一个SAX解析器,让你原来的缓冲区的字节偏移,并与DOM解析器,让您与非空值终止的C字符串工作结合起来这一点。 我会相信这意味着你必须切换到C或也许C ++库。 我已经使用rapidxml为依稀相似,你正在尝试(嵌在巨大的XML文件的数据块)的东西。



文章来源: iOS: Combining SAX and DOM parsing