I am trying to read a csv file that has barcodes in the first column, but when R gets it into a data.frame, it converts “1665535004661” to “1.67E+12”. Is there a way to preserve this number in an integer format? I tried assigning a class of “double”, but that didn’t work, nor did assigning a class of “character”. Once it is in the 1.67E+12 format any attempt to convert it back to an integer returns “167000000000”.
Thanks,
J--
It's not in a "1.67E+12 format", it just won't print entirely using the defaults. R is reading it in just fine and the whole number is there.
x <- 1665535004661
> x
[1] 1.665535e+12
> print(x, digits = 16)
[1] 1665535004661
See, the numbers were there all along. They don't get lost unless you have a really large number of digits. Sorting on what you brought in will work fine and you can just explicitly call print() with the digits option to see your data.frame instead of implicitly by typing the name.
Picking up on what you said in the comments, you can directly import the text as a character by specifying the colClasses
in read.table()
. For example:
num <- "1665535004661"
dat.char <- read.table(text = num, colClasses="character")
str(dat.char)
#------
'data.frame': 1 obs. of 1 variable:
$ V1: chr "1665535004661"
dat.char
#------
V1
1 1665535004661
Alternatively (and for other uses), you can specify the digits
variable under options()
. The default is 7 digits and the acceptable range is 1-22. To be clear, setting this option in no way changes or alters the underlying data, it merely controls how it is displayed on screen when printed. From the help page for ?options
:
controls the number of digits to print when printing numeric values. It is a suggestion only.
Valid values are 1...22 with default 7. See the note in print.default about values greater than
15.
Example illustrating this:
options(digits = 7)
dat<- read.table(text = num)
dat
#------
V1
1 1.665535e+12
options(digits = 22)
dat
#------
V1
1 1665535004661
To flesh this out completely and to account for the cases when setting a global setting is not preferable, you can specify digits directly as an argument to print(foo, digits = bar)
. You can read more about this under ?print.default
. This is what John describes in his answer so credit should go to him for illuminating that nuance.
try working with colClasses="character"
read.csv("file.csv", colClasses = "character")
http://stat.ethz.ch/R-manual/R-devel/library/utils/html/read.table.html
Have a look at this link.
From the ?is.integer page:
"Note that current implementations of R use 32-bit integers for integer vectors, so the range of representable integers is restricted to about +/-2*10^9?
1665535004661L > 2*10^9
[1] TRUE
You want package Rmpfr.
library(Rmpfr)
x <- mpfr(15, precBits= 1024)
Take a look at the int64
package: Bringing 64-bit data to R.
You can use the numerals arguments when you are doing
read.csv
. So for example:
read.csv(x, sep = ";", numerals = c("no.loss")) Where x is your data.
This preserves the value of the long integers and doesn't mess with their representation when you import the data.
Since you are not performing arithmetic on this value, character is appropriate. You can use the colClasses argument to set various classes for each column, which is probably better than using all character.
data.csv:
a,b,c
1001002003003004,2,3
Read character, then integers:
x <- read.csv('test.csv',colClasses=c('character','integer','integer'))
x
a b c
1 1001002003003004 2 3
mode(x$a)
[1] "character"
mode(x$b)
[1] "numeric"