XMLParser has problems reading UTF8 characters

2019-09-10 02:07发布

I am trying to parse an XML as follows

<CntyNtry>
    <EngNm>Virgin Islands (British)</EngNm>
    <FrNm>Vierges britanniques (les Îles)</FrNm>
    <A2Cd>VG</A2Cd>
    <A3Cd>VGB</A3Cd>
    <CtryNbr>92</CtryNbr>
</CntyNtry>

As you can see, there are some accents on some of the letters.

I tried to parse the XML with following code

func parser(_ parser: XMLParser, didStartElement elementName: String, namespaceURI: String?, qualifiedName qName: String?, attributes attributeDict: [String : String] = [:]) {
    if elementName == Element.getXMLRecordElementTagName() {
        stack.push(Element.newObject())
        record.removeAll(keepingCapacity: false)
    } else if Element.getXMLRecordAttributeElementTagName().contains(elementName) {
        stackKey.push(Element.getNSManagedObjectAttributeName(fromXMLRecordElementTagName: elementName))
    }
}

func parser(_ parser: XMLParser, foundCharacters string: String) {
    let key = stackKey.pop()
    if key != nil {
        record[key!] = string
    }
}

func parser(_ parser: XMLParser, didEndElement elementName: String, namespaceURI: String?, qualifiedName qName: String?) {
    if elementName == Element.getXMLRecordElementTagName() {
        Element.add(object: record)
        record.removeAll(keepingCapacity: false)
    }
}

If anybody needs the detail of the rest of the code, please let me know but basically record[key!] = string should be able to read the UTF8 characters.

When I test the data on my unit code, I get following error, where the string is not read after the accent string. I have tried all other data with accents and it is same error.

XCTAssertEqual failed: ("Optional("Vierges britanniques (les")") is not equal to ("Optional("Vierges britanniques (les Îles)")") -

Is my unit test code wrong? or is there a problem in the parser?

func testImportDataCnty() {
    Country.delete()
    XCTAssertTrue(Country.count() == 0)
    XCTAssertTrue(importerCnty.importData())
    XCTAssertTrue(Country.count() > 0)

    let kor = Country.get(id: ["VGB"])?[0] as! Country
    XCTAssertEqual(kor.englishName, country2["englishName"] as? String)
    XCTAssertEqual(kor.frenchName, country2["frenchName"] as? String)
    //Test failed on the above row.
    XCTAssertEqual(kor.alpha2Code, country2["alpha2Code"] as? String)
    XCTAssertEqual(kor.alpha3Code, country2["alpha3Code"] as? String)
    XCTAssertEqual(kor.countryNumber, Int16(country2["countryNumber"] as! Int))
}

标签: xml swift3
2条回答
我欲成王,谁敢阻挡
2楼-- · 2019-09-10 03:03

I have solved the issue by changing my code as below. It seems that foundCharacter parser reads the string multiple times if there is a special character in the string, so I needed to append them all.

func parser(_ parser: XMLParser, foundCharacters string: String) {
    let key = stackKey.peek()
    if key != nil {
        if record[key!] != nil {
            record[key!] = record[key!]! + string
        } else {
            record[key!] = string
        }
    }
}
查看更多
迷人小祖宗
3楼-- · 2019-09-10 03:09

You should store any special or foreign language characters in the XML in their HTML encoded form. As an example, when I needed to write XML for an Ampersand I did the following:

<name>Jones &amp; Jones</name>

In your case, it should be:

<FrNm>Vierges britanniques (les &Icirc;les)</FrNm>

See this HTML encoding table.

查看更多
登录 后发表回答