I have a JSON file (an export from mongoDB) that I'd like to load into R. The document is about 890 MB in size with roughly 63,000 rows of 12 fields. The fields are numeric, character and date. I'd like to end up with a 63000 x 12 data frame.
lines <- readLines("fb2013.json")
result: jFile has all 63,000 elements in char class and all fields are lumped into one field.
Each file looks something like this:
"{ \"_id\" : \"10151271769737669\", \"comments_count\" : 36, \"created_at\" : { \"$date\" : 1357941938000 }, \"icon\" : \"http://blahblah.gif\", \"likes_count\" : 450, \"link\" : \"http://www.blahblahblah.php\", \"message\" : \"I wish I could figure this out!\", \"page_category\" : \"Computers\", \"page_id\" : \"30968999999\", \"page_name\" : \"NothingButTrouble\", \"type\" : \"photo\", \"updated_at\" : { \"$date\" : 1358210153000 } }"
Using rjson,
jFile <- fromJSON(paste(readLines("fb2013.json"), collapse=""))
only the first row is read into jFile but there are 12 fields.
Using RJSONIO:
jFile <- fromJSON(lines)
results in the following:
Warning messages:
1: In if (is.na(encoding)) return(0L) :
the condition has length > 1 and only the first element will be used
Again, only the first row is read into jFile and there are 12 fields.
The output from rjson and RJSONIO looks something like this:
$`_id`
[1] "1018535"
$comments_count
[1] 0
$created_at
$date
1.357027e+12
$icon
[1] "http://blah.gif"
$likes_count
[1] 20
$link
[1] "http://www.chachacha"
$message
[1] "I'd love to figure this out."
$page_category
[1] "Internet/software"
$page_id
[1] "3924395872345878534"
$page_name
[1] "Not Entirely Hopeless"
$type
[1] "photo"
$updated_at
$date
1.357027e+12
Since you want a data.frame, try this:
Note that this conversion assumes that the dates were stored as milliseconds since the epoch.
try