In R, Int64 whole numbers can not be accurately serialized to and from JSON, because existing JSON libraries will coerce the value into a numeric, or expect to represent the number in scientific notation.
Does anyone know of a way to accurately serialize and deserialize whole Int64 numbers to/from JSON with precision, or is a library modification (probably to RJSONIO) required?
The full story, including libraries I have tried so far, and the gacky workarounds necessary for the interim:
> library(gmp)
> library(bit64)
> library(rjson)
> library(RJSONIO)
>
> options.bak <- getOption("digits")
> options(digits = 22)
>
> #This is our value!
> int64.text <- "5812766036735097952"
> #This whole number loses precision when stored as a numeric.
> as.bigz(int64.text) - as.numeric(int64.text)
Big Integer ('bigz') :
[1] 96
>
> #PROBLEM 1: Deserialization from JSON
>
> #rjson parses this number as a numeric, and demonstrates the same loss.
> json.text <- "{\"record.id\":5812766036735097952}"
> rjson.parsed <- rjson::fromJSON(json.text)$record.id
> str(rjson.parsed)
num 5.81e+18
> as.bigz(int64.text) - as.bigz(rjson.parsed)
Big Integer ('bigz') :
[1] 96
> #so does RJSONIO, a library that allows you to specify floating point precision.
> rjsonio.parsed <- RJSONIO::fromJSON(json.text, digits = 50)["record.id"]
> as.bigz(int64.text) - as.bigz(rjsonio.parsed)
Big Integer ('bigz') :
[1] 96
>
> #For now, I have solved this by hacking the JSON with some regex magic. Here's a snippet, although
> # i'm really processing a much larger JSON string.
> modified.json.text <- gsub("record.id\\\":([0-9]+)", "record.id\\\":\"\\1\"", json.text)
> id.text <- fromJSON(modified.json.text)$record.id
Error in fromJSON(modified.json.text)$record.id :
$ operator is invalid for atomic vectors
> id.bigz <- as.bigz(int64.text)
> id.bigz - as.bigz(int64.text)
Big Integer ('bigz') :
[1] 0
> id.bigz
Big Integer ('bigz') :
[1] 5812766036735097952
>
> #However, hacking the JSON isn't really a good solution, and relies upon there being convenient tags
> # nearby for the regex match to work. Being able to serialize to a precise data structure in the
> # first place is best. Sorry R, there are largers number than 2^32
>
> ###Problem 2: Deserialization
> #Neither rjson and RJSONIO support bigz objects:
> rjson::toJSON(as.bigz(int64.text))
Error in rjson::toJSON(as.bigz(int64.text)) :
unable to convert R type 24 to JSON
> RJSONIO::toJSON(as.bigz(int64.text), digits = 50)
Error: evaluation nested too deeply: infinite recursion / options(expressions=)?
Error during wrapup: evaluation nested too deeply: infinite recursion / options(expressions=)?
> #Int64 will serialize, but with scientific notation:
> toJSON(as.integer64(int64.text))
[1] "[ 4.0156e+80 ]"
> RJSONIO::toJSON(as.integer64(int64.text, digits = 50))
[1] "[ 4.0156e+80 ]"
>
> #So again, another JSON hack is in order:
> encoded.json.out <- toJSON(c(record.id = paste0("INT64", int64.text)))
> modified.json.out <- gsub("record.id\\\":\"INT64([0-9]+)\"", "record.id\\\":\\1", encoded.json.out)
> modified.json.out
[1] "{\n \"record.id\": \"INT645812766036735097952\" \n}"
> options(digits = options.bak)