I am using R to process Census data which uses really long numeric GEOIDs to identify the geographies. The issue I am facing is when writing out the processed data using write_csv
(from the readr
package) it is writing these GEOIDs in scientific notation. Is there a way to get around this?
Note: I am able to toggle scientific notation display on R console using by setting the scipen
option to a sufficiently large value. But this setting does not seem to extend into the readr
library.
Here is a toy dataset:
library(dplyr)
library(readr) # which is the package with write_csv
(tbl_df(data.frame(GEOID = seq(from=60150001022000, to=60150001022005, 1))))
Source: local data frame [6 x 1]
GEOID
1 60150001022000
2 60150001022001
3 60150001022002
4 60150001022003
5 60150001022004
6 60150001022005
write_csv((tbl_df(data.frame(GEOID = seq(from=60150001022000, to=60150001022005, 1)))), "test.csv")
This is what I am getting currently. I am looking for a way to get the same numbers as above:
GEOID
6.02E+13
6.02E+13
6.02E+13
6.02E+13
6.02E+13
6.02E+13
I wrote a pull request with a patch to improve the control of scientific notation in
write_csv
.With this patch you would would have a
int_use_scientific=FALSE
argument inwrite_csv
that would solve your issue. Hopefully it will be eventually merged.It would probably be safer to use character values:
It's a bit ironic that the write_csv function does coerce some of its output to character values, but not numeric columns. Only if a column passes the
is.object
test will it be coerced. There does not appear to be a switch to throw that will preserve maximal precision. Thewrite.table
and its offspringwrite.csv
functions have several switches that allow suppression of quotes and other settings that allow tailoring of output butwrite_csv
has very little of such.You can trick write_csv into thinking that a numeric column is something more complex and this does result in the
as.character
output, albeit with quotes.As a matter of best practices I do not agree with your choice of insisting that ID-variables remain numeric. There is too much violence that can be applied to that storage mode for an object. You do not need any of the arithmetic operations for an ID-variable.
I think it would be safer to recode those columns into string while exporting the CSV in such a case
I'd rather suggest recoding such columns to type
int
, because if sowrite_*
won't use scientific number encoding anymore. To convert all numeric columns in one pass (e.g. in case you're dealing with a counts matrix), you could do:Use bit64, it is a S3 class for vectors for 64bit Integers
If you read this data again in R, it will assign the correct datatype.
Hope this helps. I opened the csv in a text editor, the numbers had "" around them. But it still worked.
I would suggest you use
instead of
If I open test.csv it opens the file in Excel. Excel turns it into scientific notation. When I right click and open with notepad it looks good and I see the original numbers without scientific notation.