Json parsing with unicode characters

2020-02-14 08:10发布

问题:

i have a json file with unicode characters, and i'm having trouble to parse it. I've tried in Flash CS5, the JSON library, and i have tried it in http://json.parser.online.fr/ and i always get "unexpected token - eval fails"

I'm sorry, there realy was a problem with the syntax, it came this way from the client.

Can someone please help me? Thanks

回答1:

Quoth the RFC:

JSON text SHALL be encoded in Unicode. The default encoding is UTF-8.

So a correctly encoded Unicode character should not be a problem. Which leads me to believe that it's not correctly encoded (maybe it uses latin-1 instead of UTF-8). How did you create the file? In a text editor?



回答2:

There might be an obscure Unicode whitespace character hidden in your string.

This URL contains more detail:

http://timelessrepo.com/json-isnt-a-javascript-subset



回答3:

In asp.net you would think you would use System.Text.Encoding to convert a string like "Paul\u0027s" back to a string like "Paul's" but i tried for hours and found nothing that worked.

The trouble is hardcoding a string as shown above already decodes the string as you will see if you put a break point on it so in the end i wrote a function that converts the Hex27 to Dec39 so that i ended up with HTML encodeing and then decoded that.

 string Padding = "000";
                for (int f = 1; f <= 256; f++)
                {
                    string Hex = "\\u" + Padding.Substring(0, 4 - f.ToString().Length) + f;
                    string Dec = "&#" + Int32.Parse(f.ToString(), NumberStyles.HexNumber) + ";";
                    HTML = HTML.Replace(Hex, Dec);
                }
                HTML = System.Web.HttpUtility.HtmlDecode(HTML);

Ugly as sin, I know but without using the latest framework (Not on ISP's server) it was the best I could do and someone must know a better solution.



回答4:

I had the same problem and I just change the file encoding type Mac-Roman/windows-1252 to UTF-8.. and it worked



回答5:

I had the same problem with Twitter json files. I was parsing them in Python with json.loads(tweet) but it failed for half of the records.

I changed to Python3 and it works well now.



回答6:

If you seem to have trouble with the encoding of a JSON file (i.e. escaped codes such as \u00fc aren't displayed correctly regardless of your editor's encoding setting) generated by Python with json.dump s(): it encodes ASCII by default and escapes the unicode characters! See python json unicode - how do I eval using javascript (and python: json.dumps can't handle utf-8? and Why does json.dumps escape non-ascii characters with "\uxxxx").



标签: json unicode