I'm trying to read a JSON file into R but I got this error:
Error in parseJSON(txt) : parse error: trailing garbage
[ 33.816101, -117.979401 ] } { "a": "Mozilla\/4.0 (compatibl
(right here) ------^
I downloaded the file from http://1usagov.measuredvoice.com/ and unzipped it using 7zip, then I used the following code in R:
library(jsonlite)
jsonData <- fromJSON("usagov_bitly_data2013-05-17-1368832207")
I'm not sure why this error happens, I looked up in Google but there's no information, someone that could help me? Is this a file problem or my code?
The package
tidyjson
can also read this "json lines" format:read_json("my.json",format="jsonl")
The output is then parsed using a series of pipes, rather than having lists nested with dataframes.
This format called ndjson and designed to stream import (including the gzip). Just use this:
Or alternatively use the curl package for better performance or to customize the http request:
ANOTHER UPDATE
You can use the
ndjson
package to process this ndjson/streaming JSON data. It's faster thanjsonlite::stream_in()
and always produces a completely "flat" data frame:If we examine the resultant data frame2, you'll see
ndjson
expandsll
intoll.0
andll.1
where you get alist
column injsonlite
that you have to deal with later.ndjson
:jsonlite
:UPDATE
The latest version of the
jsonlite
package supports streaming JSON (which is what this actually is). You can now read it with one line like so:See also Jeroen's answer below for stream-parsing it directly over http.
OLD ANSWER
It turns out this is a "pseudo-JSON" file. I come across these in many naive API systems I work in. Each line is valid JSON, but the individual objects aren't in a JSON array. You need to use
readLines
and then build your own, valid JSON array from it and pass that intofromJSON
:I combined the
readLines
in with thepaste
/sprintf
call since theobject.size
of the resultant (temporary) object is2,025,656
bytes (~2MB) and didn't feel like doing anrm
on a separate temporary variable.