Reading unescaped backslashes in JSON into R

2019-08-17 07:13发布

问题:

I'm trying to read some data from the Facebook Graph API Explorer into R to do some text analysis. However, it looks like there are unescaped backslashes in the JSON feed, which is causing rjson to barf. The following is a minimal example of the kind of input that's causing problems.

library(rjson)
txt <- '{"data":[{"id":2, "value":"I want to \\"post\\" a picture\\video"}]}'
fromJSON(txt)

(Note that the double backslashes at \\" and \\video will convert to single backslashes after parsing, which is what's in my actual data.)

I also tried the RJSONIO package which also gave errors, and even crashed R at times.

Has anyone come across this problem before? Is there a way to fix this short of manually hunting down every error that crops up? There's potentially megabytes of JSON being parsed, and the error messages aren't very informative about where exactly the problematic input is.

回答1:

Just replace backslashes that aren't escaping double quotes, tabs or newlines with double backslashes.

In the regular expression, '\\\\' is converted to one backslash (two levels of escaping are needed, one for R, one for the regular expression engine). We need the perl regex engine in order to use lookahead.

library(stringr)
txt2 <- str_replace_all(txt, perl('\\\\(?![tn"])'), '\\\\\\\\')
fromJSON(txt2)


回答2:

The problem is that you are trying to parse invalid JSON:

library(jsonlite)
txt <- '{"data":[{"id":2, "value":"I want to \\"post\\" a picture\\video"}]}'
validate(txt)

The problem is the picture\\video part because \v is not a valid JSON escape sequence, even though it is a valid escape sequence in R and some other languages. Perhaps you mean:

library(jsonlite)
txt <- '{"data":[{"id":2, "value":"I want to \\"post\\" a picture\\/video"}]}'
validate(txt)
fromJSON(txt)

Either way to problem is at the JSON data source that is generating invalid JSON. If this data really comes form Facebook, you found a bug in their API. But more likely you are not retrieving it correctly.