I am trying to parse an XML file with <?version = 1.0, encoding = UTF-8>
but ran into an error message invalid byte 2 of 2-byte UTF-8 sequence
. Does anybody know what caused this problem?
相关问题
- Delete Messages from a Topic in Apache Kafka
- Jackson Deserialization not calling deserialize on
- Illegal to have multiple roots (start tag in epilo
- How to maintain order of key-value in DataFrame sa
- StackExchange API - Deserialize Date in JSON Respo
Either the parser is set for UTF-8 even though the file is encoded otherwise, or the file is declared as using UTF-8 but it really doesn't.
For those who still get such mistake.
since UTF-8 is being used check out your xml document for any latin letters or so: I had the same problem and the reason was i had this:
Hope this helps
I had the same problem. My problem was that I created a new XML file with jdom and the FileWriter(xmlFile). The FileWriter was not able to create a UTF-8 File. Instead using the FileOutputStream(xmlFile) solved it.
I had the same problem too when trying import my .xml file into my java tool. And I found a good solution for this: 1. Open the .xml file with Notepad++ then save the .xml file as .rtf file. Then open this file in WordPad application. 2. Save the .rtf file as .txt file, then open it with Notepad, and save it as .xml file again. When saving in Notepad, near the end of the pop-up window, make sure choosing the option "Encoding: UTF-8". It worked for mine, hope it's useful for yours too.
You could try to change default character encoding used by String.getBytes() to utf-8. Use VM option -Dfile.encoding=utf-8.
Most commonly it's due to feeding
ISO-8859-x
(Latin-x, like Latin-1) but parser thinking it is gettingUTF-8
. Certain sequences of Latin-1 characters (two consecutive characters with accents or umlauts) form something that is invalid asUTF-8
, and specifically such that based on first byte, second byte has unexpected high-order bits.This can easily occur when some process dumps out
XML
using Latin-1, but either forgets to outputXML
declaration (in which caseXML
parser must default toUTF-8
, as perXML
specs), or claims it'sUTF-8
even when it isn't.