JSONObject in org.json lib: utf-8 encoding issue

I'm following the Unicode - How to get the characters right? post.

The only issue I have is with JSONObject encoding (I'm using org.json lib).

The issue arises when I put a string like àòùèì€€, for example, in a JSONObject.

System.out.println(entry.getValue());
JSONObject temp = new JSONObject();
temp.put("values", entry.getValue();
System.out.println(temp.toString());

I obtain àòùèì€€ and {"values":"àòùèì\u20ac\u20ac"} instead of {"values":"àòùèì€€"}.

EDIT

By passing from an hashtable to a jsonObject, the extended utf-8 encoding is used. For example, the hashtable

 {€èòàùì€ù=èòàù€ì, €òàèùì€=èòàù€ìç§$}

becomes the JSONObject

 {"\u20acòàèùì\u20ac":"èòàù\u20acìç§$","\u20acèòàùì\u20acù":"èòàù\u20acì"}

标签： json encoding utf-8 org.json

1条回答

走好不送

2楼-- · 2019-05-18 16:13

They are exactly equal, with the Unicode escaping taking a bit more space. Like writing \u004a in Java is exactly the same as writing a. If correctness is your concern, it doesn't matter.

And it won't take considerable amount of extra space either unless most of your text is between 0x2000 - 0x20FF:

The following code escapes C0 and C1 control characters, but it also escapes 0x2000 - 0x20FF:

     if (c < ' ' || (c >= '\u0080' && c < '\u00a0')
                    || (c >= '\u2000' && c < '\u2100')) {

So any character between 0x2000 - 0x20FF and control characters are represented as unicode escapes. This makes sense for control characters because those are not allowed in JSON in their unescaped form.

As for 0x2000 - 0x20FF, I have no idea because the code is not commented. Every character unescaped in that range is valid JSON. Of course, 0x2028 and 0x2029 are not valid in Javascript (so this small detail makes JSON syntax not a subset of Javascript syntax), so it's good idea to escape those in JSON in case it is being used as JSONP which is Javascript really. But it is not apparent to me why the code escapes a whole range because just 2 characters in the range are illegal.

0人赞添加讨论(0) 举报

JSONObject in org.json lib: utf-8 encoding issue

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间