Why my DOM parser cant read UTF-8

2020-02-10 06:15发布

问题:

I have problem that my DOM parser can´t load file when there are UTF-8 characters in XML file Now, i am aware that i have to give him instruction to read utf-8, but i don´t know how to put it in my code here it is:

File xmlFile = new File(fileName);
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(xmlFile);
doc.getDocumentElement().normalize();

i am aware that there is method setencoding(), but i don´t know where to put it in my code...

回答1:

Try this. Worked for me

        InputStream inputStream= new FileInputStream(completeFileName);
        Reader reader = new InputStreamReader(inputStream,"UTF-8");
        InputSource is = new InputSource(reader);
        is.setEncoding("UTF-8");

        DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
        DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
        Document doc = dBuilder.parse(is);


回答2:

Try to use Reader and provide encoding as parameter:

InputStream inputStream = new FileInputStream(fileName);
documentBuilder.parse(new InputSource(new InputStreamReader(inputStream, "UTF-8")));


回答3:

I used what Eugene did up there and changed it a little.

DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();

FileInputStream in = new FileInputStream(new File("XML.xml"));
Document doc = dBuilder.parse(in, "UTF-8");

though this will be read as UTF-8 if you are printing in eclipse console it won't show any 'UTF-8' characters unless the java file is saved as 'UTF-8', or at least that what happened with me



标签: java parsing dom