Unicode(0xb) error while parsing an XML file using

2019-07-03 16:40发布

While parsing an XML file Stax produces an error:

Unicode(0xb) error-An invalid XML character (Unicode: 0xb) was found in the element content of the document.

Just click on the link below with the xml line with special character as "VI". It's not an alphabetical character: when you try to copy and paste it in Notepad, you will get it as some symbol. I have tried parsing it using Stax. It was showing the above-mentioned error.

Please can somebody give me a solution for this?

Thanks in advance.

标签： java xml parsing unicode

3条回答

混吃等死

2楼-- · 2019-07-03 17:19

Whenever invalid xml character comes xml, it gives such error. When u open it in notepad++ it look like VT, SOH,FF like these are invalid xml chars. I m using xml version 1.0 and i validate text data before entering it in database by pattern

Pattern p = Pattern.compile("[^\u0009\u000A\u000D\u0020-\uD7FF\uE000-\uFFFD\u10000-\u10FFF]+");
retunContent = p.matcher(retunContent).replaceAll("");

It will ensure that no invalid special char will enter in xml

0人赞添加讨论(0) 举报

时光不老，我们不散

3楼-- · 2019-07-03 17:29

According to the XML W3C Recommendation 0xb is not allowed in an XML file:

Character Range [2] Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] /* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. */

So strictly speaking your input file is not an XML file.

0人赞添加讨论(0) 举报

爷的心禁止访问

4楼-- · 2019-07-03 17:31

0xB (vertical tab) is not a valid character in XML. The only valid characters before ASCII 32 (0x20, space) are 0x9 (tab), 0xA (carriage return) and 0xD (line feed).

In short, what you are trying to parse is NOT XML.

0人赞添加讨论(0) 举报

Unicode(0xb) error while parsing an XML file using

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间