Why can't I parse a XML file using QXmlStreamR

2019-04-06 05:54发布

问题:

I'm trying to figure out how QXmlStreamReader works for a C++ application I'm writing. The XML file I want to parse is a large dictionary with a convoluted structure and plenty of Unicode characters so I decided to try a small test case with a simpler document. Unfortunately, I hit a wall. Here's the example xml file:

<?xml version="1.0" encoding="UTF-8" ?>
<persons>
    <person>
        <firstname>John</firstname>
        <surname>Doe</surname>
        <email>john.doe@example.com</email>
        <website>http://en.wikipedia.org/wiki/John_Doe</website>
    </person>
    <person>
        <firstname>Jane</firstname>
        <surname>Doe</surname>
        <email>jane.doe@example.com</email>
        <website>http://en.wikipedia.org/wiki/John_Doe</website>
    </person>
    <person>
        <firstname>Matti</firstname>
        <surname>Meikäläinen</surname>
        <email>matti.meikalainen@example.com</email>
        <website>http://fi.wikipedia.org/wiki/Matti_Meikäläinen</website>
    </person>
</persons>

...and I'm trying to parse it using this code:

int main(int argc, char *argv[])
{
    if (argc != 2) return 1;

    QString filename(argv[1]);
    QTextStream cout(stdout);
    cout << "Starting... filename: " << filename << endl;

    QFile file(filename);
    bool open = file.open(QIODevice::ReadOnly | QIODevice::Text);
    if (!open) 
    {
        cout << "Couldn't open file" << endl;
        return 1;
    }
    else 
    {
        cout << "File opened OK" << endl;
    }

    QXmlStreamReader xml(&file);
    cout << "Encoding: " << xml.documentEncoding().toString() << endl;

    while (!xml.atEnd() && !xml.hasError()) 
    {
        xml.readNext();
        if (xml.isStartElement())
        {
            cout << "element name: '" << xml.name().toString() << "'" 
                << ", text: '" << xml.text().toString() << "'" << endl;
        }
        else if (xml.hasError())
        {
            cout << "XML error: " << xml.errorString() << endl;
        }
        else if (xml.atEnd())
        {
            cout << "Reached end, done" << endl;
        }
    }

    return 0;
}

...then I get this output:

C:\xmltest\Debug>xmltest.exe example.xml
Starting... filename: example.xml
File opened OK
Encoding:
XML error: Encountered incorrectly encoded content.

What happened? This file couldn't be simpler and it looks consistent to me. With my original file I also get a blank entry for the encoding, the entries' names() are displayed, but alas, the text() is also empty. Any suggestions greatly appreciated, personally I'm thorougly mystified.

回答1:

I'm answering this myself as this problem was related to three issues, two of which were brought up by the responses.

  1. The file actually wasn't UTF-8 encoded. I changed the encoding to iso-8859-1 and the encoding warning disappeared.
  2. The text() function doesn't work as I expected. I have to use readElementText() to read the entries' contents.
  3. When I try to readElementText() on an element that doesn't contain text, like the top-level <persons> in my case, the parser returns an "Expected character data" error and the parsing is interrupted. I find this behaviour strange (in my opinion returning an empty string and continuing would be better) but I guess as long as the specification is known, I can work around it and avoid calling this function on every entry.

The relevant code section that works as expected now looks like this:

while (!xml.atEnd() && !xml.hasError()) 
{
    xml.readNext();
    if (xml.isStartElement())
    {
        QString name = xml.name().toString();
        if (name == "firstname" || name == "surname" || 
            name == "email" || name == "website")
        {
            cout << "element name: '" << name  << "'" 
                         << ", text: '" << xml.readElementText() 
                         << "'" << endl;
        }
    }
}
if (xml.hasError())
{
    cout << "XML error: " << xml.errorString() << endl;
}
else if (xml.atEnd())
{
    cout << "Reached end, done" << endl;
}


回答2:

The file is not UTF-8 encoded. Change the encoding to iso-8859-1 and it will parse without error.

<?xml version="1.0" encoding="iso-8859-1" ?>


回答3:

About the encoding: As baysmith and and hmuelner said, your file is probably incorrectly encoded (unless the encoding got lost when pasting it here). Try to fix that with some advanced text editor.

The problem with your usage of text() is that it doesn't work as you expect it to. text() returns the content of the current token if it is of type Characters, Comment, DTD or EntityReference. Your current token is a StartElement, so it's empty. If you want to consume/read the text of the current startElement, use readElementText() instead.



回答4:

Are you sure your document is UTF-8 encoded? What editor did you use? Check how the ä-characters look like if you view the file without decoding.



回答5:

Try this Example i just copied it from my project it work for me.

void MainWindow::readXML(const QString &fileName)
{


fileName = "D:/read.xml";

QFile* file = new QFile(fileName);
if (!file->open(QIODevice::ReadOnly | QIODevice::Text))
{
     QMessageBox::critical(this, "QXSRExample::ReadXMLFile", "Couldn't open xml file", QMessageBox::Ok);
     return;
}

/* QXmlStreamReader takes any QIODevice. */
QXmlStreamReader xml(file);
/* We'll parse the XML until we reach end of it.*/
while(!xml.atEnd() && !xml.hasError())
{
    /* Read next element.*/
    QXmlStreamReader::TokenType token = xml.readNext();
    /* If token is just StartDocument, we'll go to next.*/
    if(token == QXmlStreamReader::StartDocument)
        continue;

    /* If token is StartElement, we'll see if we can read it.*/
    if(token == QXmlStreamReader::StartElement) {
        if(xml.name() == "email") {
            ui->listWidget->addItem("Element: "+xml.name().toString());
            continue;
        }
    }
}
/* Error handling. */
if(xml.hasError())
    QMessageBox::critical(this, "QXSRExample::parseXML", xml.errorString(), QMessageBox::Ok);

//resets its internal state to the initial state.
xml.clear();
}

void MainWindow::writeXML(const QString &fileName)
{
fileName = "D:/write.xml";
QFile file(fileName);
if (!file.open(QIODevice::WriteOnly | QIODevice::Text))
{
     QMessageBox::critical(this, "QXSRExample::WriteXMLFile", "Couldn't open anna.xml", QMessageBox::Ok);
     return;
}
QXmlStreamWriter xmlWriter(&file);
xmlWriter.setAutoFormatting(true);
xmlWriter.writeStartDocument();
//add Elements
xmlWriter.writeStartElement("bookindex");
ui->listWidget->addItem("bookindex");
xmlWriter.writeStartElement("Suleman");
ui->listWidget->addItem("Suleman");

//write all elements in xml filexl
xmlWriter.writeEndDocument();
file.close();
if (file.error())
    QMessageBox::critical(this, "QXSRExample::parseXML", file.errorString(), QMessageBox::Ok);


}