I am trying to parse an XML as follows
<CntyNtry>
<EngNm>Virgin Islands (British)</EngNm>
<FrNm>Vierges britanniques (les Îles)</FrNm>
<A2Cd>VG</A2Cd>
<A3Cd>VGB</A3Cd>
<CtryNbr>92</CtryNbr>
</CntyNtry>
As you can see, there are some accents on some of the letters.
I tried to parse the XML with following code
func parser(_ parser: XMLParser, didStartElement elementName: String, namespaceURI: String?, qualifiedName qName: String?, attributes attributeDict: [String : String] = [:]) {
if elementName == Element.getXMLRecordElementTagName() {
stack.push(Element.newObject())
record.removeAll(keepingCapacity: false)
} else if Element.getXMLRecordAttributeElementTagName().contains(elementName) {
stackKey.push(Element.getNSManagedObjectAttributeName(fromXMLRecordElementTagName: elementName))
}
}
func parser(_ parser: XMLParser, foundCharacters string: String) {
let key = stackKey.pop()
if key != nil {
record[key!] = string
}
}
func parser(_ parser: XMLParser, didEndElement elementName: String, namespaceURI: String?, qualifiedName qName: String?) {
if elementName == Element.getXMLRecordElementTagName() {
Element.add(object: record)
record.removeAll(keepingCapacity: false)
}
}
If anybody needs the detail of the rest of the code, please let me know but basically record[key!] = string should be able to read the UTF8 characters.
When I test the data on my unit code, I get following error, where the string is not read after the accent string. I have tried all other data with accents and it is same error.
XCTAssertEqual failed: ("Optional("Vierges britanniques (les")") is not equal to ("Optional("Vierges britanniques (les Îles)")") -
Is my unit test code wrong? or is there a problem in the parser?
func testImportDataCnty() {
Country.delete()
XCTAssertTrue(Country.count() == 0)
XCTAssertTrue(importerCnty.importData())
XCTAssertTrue(Country.count() > 0)
let kor = Country.get(id: ["VGB"])?[0] as! Country
XCTAssertEqual(kor.englishName, country2["englishName"] as? String)
XCTAssertEqual(kor.frenchName, country2["frenchName"] as? String)
//Test failed on the above row.
XCTAssertEqual(kor.alpha2Code, country2["alpha2Code"] as? String)
XCTAssertEqual(kor.alpha3Code, country2["alpha3Code"] as? String)
XCTAssertEqual(kor.countryNumber, Int16(country2["countryNumber"] as! Int))
}
I have solved the issue by changing my code as below. It seems that foundCharacter parser reads the string multiple times if there is a special character in the string, so I needed to append them all.
You should store any special or foreign language characters in the XML in their HTML encoded form. As an example, when I needed to write XML for an Ampersand I did the following:
In your case, it should be:
See this HTML encoding table.