Parsing or fixing JSONs with special 'undefine

2020-01-19 06:34发布

问题:

In addition to strings and numbers, valid JSON can contain special values like null and false

I need to parse a JSON generated by some API that also contains undefined. However, undefined is a valid JavaScript value, but it is not a valid JSON value, and whenever I parse it it returns a lexical error.

Examples:

library(jsonlite)

# A string works
"[{\"Sepal.Width\":\"3.5\"}]" %>% fromJSON
#  Sepal.Width
#         3.5

# A number works
"[{\"Sepal.Width\":3.5}]"  %>% fromJSON
#  Sepal.Width
#         3.5

# null works
"[{\"Sepal.Width\": null}]" %>% fromJSON
#  Sepal.Width
#          NA

# false works
 "[{\"Sepal.Width\": false}]" %>% fromJSON
#  Sepal.Width
#       FALSE

# undefined does not work
 "[{\"Sepal.Width\": undefined}]" %>% fromJSON
Error: lexical error: invalid char in json text.
                      [{"Sepal.Width": undefined}]
                     (right here) ------^

Question:

Is there any (reliable) way to parse JSON containing undefined values? If not, what is the best approach to repair this faulty JSON?

Attempt:

I've thought about simply gsubbing undefined, but that is risky, since that word could easily exist in the JSON string values.

回答1:

Nope. You cannot parse a JSON with an undefined value; undefined is indeed a special value. In fact, undefined as a "value" must not occur in valid JSON, and is intended to mean "this key [in your case, "Sepal.Width"] doesn't exist." Instead, the API is likely faulty, where it is generating JSONs with undefined values.

The official source, The JSON Data Interchange Syntax, states that

A JSON value can be an object, array, number, string, true, false, or null.

The best remedy is to examine the JSON generator or API and why it generates undefined in a JSON. You can also manually or algorithmically repair the defective JSON, and check if there are any inconsistencies in your JSON.



回答2:

For the record, I used str_replace_all() to replace :undefined with :"undefined".

This is somewhat risky because it will cause problems if the string :undefined so happens to appear in actual string values in the JSON, but in my case it's (an imperfect) solution

str_replace_all(invalid_json, ':undefined', ':"undefined"')