How to parse malformed JSONP with hex-encoded char

2019-01-28 07:49发布

I make a call to google's dictionary api like this:

var json = new WebClient().DownloadString(string.Format(@"http://www.google.com/dictionary/json?callback=dict_api.callbacks.id100&q={0}&sl=en&tl=en", "bar"));

However I get a response that this code fails to parse correctly:

json = json.Replace("dict_api.callbacks.id100(", "").Replace(",200,null)", "");
JObject o = JObject.Parse(json);

The parse dies at encountering this:

"entries":[{"type":"example","terms":[{"type":"text","text":"\x3cem\x3ebars\x3c/em\x3e of sunlight shafting through the broken windows","language":"en"}]}]}

The

\x3cem\x3ebars\x

stuff kills the parse

Is there some way to handle this JSONP response with JSON.NET?

The answer by aquinas to another "Parse JSONP" question shows nice regex x = Regex.Replace(x, @"^.+?\(|\)$", ""); to handle with JSONP part (may need to tweak regex for this case), so main part here is how to deal with hex-encoded characters.

2条回答
虎瘦雄心在
2楼-- · 2019-01-28 08:26

Reference: How to decode HTML encoded character embedded in a json string

JSON specs for strings do not allow hexadecimal ASCII escape-sequences, but only Unicode escape-sequences, which is why the escape sequence is unrecognized and which is why using \u0027 instead should work ... now you could blindly replace \x with \u00 (this should perfectly work on valid JSON, although some comments may get damaged in theory, but who cares ... :D)

So change your code to this will fix it:

        var json = new WebClient().DownloadString(string.Format(@"http://www.google.com/dictionary/json?callback=dict_api.callbacks.id100&q={0}&sl=en&tl=en", "bar"));

        json = json
                .Replace("dict_api.callbacks.id100(", "")
                .Replace(",200,null)", "")
                .Replace("\\x","\\u00");

        JObject o = JObject.Parse(json);
查看更多
Luminary・发光体
3楼-- · 2019-01-28 08:44

The server is not returning valid JSON: JSON does not support \xAB character escape sequences, only \uABCD escapes sequences.

The "solutions" I have seen execute a text-replace on the string first. Here is one of my replies to a similar questions for Java. Note the regular expression inputString.replaceAll("\\x(\d{2})", "\\u00$1") at the bottom; adapt to language.

查看更多
登录 后发表回答