XMLParser has problems reading UTF8 characters

I am trying to parse an XML as follows

<CntyNtry>
    <EngNm>Virgin Islands (British)</EngNm>
    <FrNm>Vierges britanniques (les Îles)</FrNm>
    <A2Cd>VG</A2Cd>
    <A3Cd>VGB</A3Cd>
    <CtryNbr>92</CtryNbr>
</CntyNtry>

As you can see, there are some accents on some of the letters.

I tried to parse the XML with following code

func parser(_ parser: XMLParser, didStartElement elementName: String, namespaceURI: String?, qualifiedName qName: String?, attributes attributeDict: [String : String] = [:]) {
    if elementName == Element.getXMLRecordElementTagName() {
        stack.push(Element.newObject())
        record.removeAll(keepingCapacity: false)
    } else if Element.getXMLRecordAttributeElementTagName().contains(elementName) {
        stackKey.push(Element.getNSManagedObjectAttributeName(fromXMLRecordElementTagName: elementName))
    }
}

func parser(_ parser: XMLParser, foundCharacters string: String) {
    let key = stackKey.pop()
    if key != nil {
        record[key!] = string
    }
}

func parser(_ parser: XMLParser, didEndElement elementName: String, namespaceURI: String?, qualifiedName qName: String?) {
    if elementName == Element.getXMLRecordElementTagName() {
        Element.add(object: record)
        record.removeAll(keepingCapacity: false)
    }
}

If anybody needs the detail of the rest of the code, please let me know but basically record[key!] = string should be able to read the UTF8 characters.

When I test the data on my unit code, I get following error, where the string is not read after the accent string. I have tried all other data with accents and it is same error.

XCTAssertEqual failed: ("Optional("Vierges britanniques (les")") is not equal to ("Optional("Vierges britanniques (les Îles)")") -

Is my unit test code wrong? or is there a problem in the parser?

func testImportDataCnty() {
    Country.delete()
    XCTAssertTrue(Country.count() == 0)
    XCTAssertTrue(importerCnty.importData())
    XCTAssertTrue(Country.count() > 0)

    let kor = Country.get(id: ["VGB"])?[0] as! Country
    XCTAssertEqual(kor.englishName, country2["englishName"] as? String)
    XCTAssertEqual(kor.frenchName, country2["frenchName"] as? String)
    //Test failed on the above row.
    XCTAssertEqual(kor.alpha2Code, country2["alpha2Code"] as? String)
    XCTAssertEqual(kor.alpha3Code, country2["alpha3Code"] as? String)
    XCTAssertEqual(kor.countryNumber, Int16(country2["countryNumber"] as! Int))
}

标签： xml swift3

2条回答

我欲成王，谁敢阻挡

2楼-- · 2019-09-10 03:03

I have solved the issue by changing my code as below. It seems that foundCharacter parser reads the string multiple times if there is a special character in the string, so I needed to append them all.

func parser(_ parser: XMLParser, foundCharacters string: String) {
    let key = stackKey.peek()
    if key != nil {
        if record[key!] != nil {
            record[key!] = record[key!]! + string
        } else {
            record[key!] = string
        }
    }
}

0人赞添加讨论(0) 举报

迷人小祖宗

3楼-- · 2019-09-10 03:09

You should store any special or foreign language characters in the XML in their HTML encoded form. As an example, when I needed to write XML for an Ampersand I did the following:

<name>Jones &amp; Jones</name>

In your case, it should be:

<FrNm>Vierges britanniques (les &Icirc;les)</FrNm>

See this HTML encoding table.

0人赞添加讨论(0) 举报

XMLParser has problems reading UTF8 characters

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间