Large RAM usage when parsing XML using Libxml2

2019-09-12 16:59发布

问题:

I'm downloading a XML file from an API with URLSessionDataTask.
The XML looks like this:

<?xml version="1.0" encoding="UTF-8" ?>
<ResultList id="12345678-0" platforms="A;B;C;D;E">
    <Book id="1111111111" author="Author A" title="Title A" price="9.95" ... />
    <Book id="1111111112" author="Author B" title="Title B" price="2.00" ... />
    <Book id="1111111113" author="Author C" title="Title C" price="5.00" ... />
    <ResultInfo bookcount="3" />
</ResultList>

Sometimes the XML may have thousands of books.
I'm parsing the XML with the SAX parser from Libxml2. While parsing I create a object Book and set the values from the XML like so:

private func startElementSAX(_ ctx: UnsafeMutableRawPointer?, name: UnsafePointer<xmlChar>?, prefix: UnsafePointer<xmlChar>?, URI: UnsafePointer<xmlChar>?, nb_namespaces: CInt, namespaces: UnsafeMutablePointer<UnsafePointer<xmlChar>?>?, nb_attributes: CInt, nb_defaulted: CInt, attributes: UnsafeMutablePointer<UnsafePointer<xmlChar>?>?) {

    let elementName = String(cString: name!)

    switch elementName {
    case "Book":
        let book = buildBook(nb_attributes: nb_attributes, attributes: attributes)
        parser.delegate?.onBook(book: book)
    default:
        break
    }
}

func buildBook(nb_attributes: CInt, attributes: UnsafeMutablePointer<UnsafePointer<xmlChar>?>?) -> Book {
    let fields = 5 /* (localname/prefix/URI/value/end) */
    let book = Book()
    for i in 0..<Int(nb_attributes) {
        if let localname = attributes?[i * fields + 0],
            //let prefix = attributes?[i * fields + 1],
            //let URI = attributes?[i * fields + 2],
            let value_start = attributes?[i * fields + 3]//,
            /*let value_end = attributes?[i * fields + 4]*/ {

                let localnameString = String(cString: localname)
                let string_start = String(cString: value_start)
                //let string_end = String(cString: value_end)

                if let end = string_start.characters.index(of: "\"") {
                    let value = string_start.substring(to: end)
                    book.setValue(value, forKey: localnameString)
                } else {
                    book.setValue(string_start, forKey: localnameString)
                }
        }
    }
    return book
}

In the UITableViewController the onBook(book: Book) delegate method appends the book object to an array and updates the UITableView. So far so good.

The problem now is, it takes too much RAM of the device and so my device becomes slow. With ~500 books in the XML it takes >500 MB of RAM. I don't know why. When I lookup the RAM in Instruments, I see all the allocated memory in the category _HeapBufferStorage<_StringBufferIVars, UInt16>

With multiple entries greater than 100 KB

In the Event History is the method buildBook() listed

When I use the XMLParser from Foundation with the constructor XMLParser(contentsOf: URL) which first downloads the whole XML and then parses it, I have normal RAM usage. No matter how many books. But I want to show the books ASAP in the UITableView. I just want something like Android's XMLPullParser for iOS.

回答1:

I'm using libxml2 (due to this issue) and have code like this:

xmlParseChunk(ctxt, data, Int32(read), 0)

Changing the call to this reduces the amount of memory consumed considerably:

autoreleasepool {
    xmlParseChunk(ctxt, data, Int32(read), 0)
}

If you're using the push parser call like above this will likely fix your problem. If not then wrapping your delegate call in the autoreleasepool call may help.

The reason is because a lot of intermediate objects are being created and added to an autorelease pool and not being released. See this post for more details.

An alternative is to work to reduce the number of objects being added to the autorelease pool by changing your code in other ways. I found for example I was creating extra strings by trimming white space in places where I could avoid it.

Additionally, this is not related to your problem, but the start and the end of the attributes tell you the length of the string and you should be using that.

For example:

let valStart = UnsafeMutableRawPointer(mutating: attributes!
    .advanced(by: 3 + Int(i * 5)).pointee)
let valEnd = UnsafeMutableRawPointer(mutating: attributes!
    .advanced(by: 4 + Int(i * 5)).pointee)
let valData = Data(bytesNoCopy: valStart!, count: valEnd! - valStart!, 
    deallocator: .none)
let attrValue = String(data: valData, encoding: String.Encoding.utf8)