Preserving large numbers

2019-01-18 12:03发布

问题:

I am trying to read a csv file that has barcodes in the first column, but when R gets it into a data.frame, it converts “1665535004661” to “1.67E+12”. Is there a way to preserve this number in an integer format? I tried assigning a class of “double”, but that didn’t work, nor did assigning a class of “character”. Once it is in the 1.67E+12 format any attempt to convert it back to an integer returns “167000000000”.

Thanks, J--

回答1:

It's not in a "1.67E+12 format", it just won't print entirely using the defaults. R is reading it in just fine and the whole number is there.

x <- 1665535004661
> x
[1] 1.665535e+12
> print(x, digits = 16)
[1] 1665535004661

See, the numbers were there all along. They don't get lost unless you have a really large number of digits. Sorting on what you brought in will work fine and you can just explicitly call print() with the digits option to see your data.frame instead of implicitly by typing the name.



回答2:

Picking up on what you said in the comments, you can directly import the text as a character by specifying the colClasses in read.table(). For example:

num <- "1665535004661"
dat.char <- read.table(text = num, colClasses="character")
str(dat.char)
#------
'data.frame':   1 obs. of  1 variable:
 $ V1: chr "1665535004661"
dat.char
#------
             V1
1 1665535004661

Alternatively (and for other uses), you can specify the digits variable under options(). The default is 7 digits and the acceptable range is 1-22. To be clear, setting this option in no way changes or alters the underlying data, it merely controls how it is displayed on screen when printed. From the help page for ?options:

controls the number of digits to print when printing numeric values. It is a suggestion only.
Valid values are 1...22 with default 7. See the note in print.default about values greater than
15.

Example illustrating this:

options(digits = 7)
dat<- read.table(text = num)

dat
#------
            V1
1 1.665535e+12

options(digits = 22)
dat
#------
             V1
1 1665535004661

To flesh this out completely and to account for the cases when setting a global setting is not preferable, you can specify digits directly as an argument to print(foo, digits = bar). You can read more about this under ?print.default. This is what John describes in his answer so credit should go to him for illuminating that nuance.



回答3:

try working with colClasses="character"

read.csv("file.csv", colClasses = "character")

http://stat.ethz.ch/R-manual/R-devel/library/utils/html/read.table.html

Have a look at this link.



回答4:

From the ?is.integer page:

"Note that current implementations of R use 32-bit integers for integer vectors, so the range of representable integers is restricted to about +/-2*10^9?

1665535004661L > 2*10^9 [1] TRUE

You want package Rmpfr.

library(Rmpfr)
x <- mpfr(15, precBits= 1024)


回答5:

Take a look at the int64 package: Bringing 64-bit data to R.



回答6:

You can use the numerals arguments when you are doing read.csv. So for example:

read.csv(x, sep = ";", numerals = c("no.loss")) Where x is your data.

This preserves the value of the long integers and doesn't mess with their representation when you import the data.



回答7:

Since you are not performing arithmetic on this value, character is appropriate. You can use the colClasses argument to set various classes for each column, which is probably better than using all character.

data.csv:

a,b,c
1001002003003004,2,3

Read character, then integers:

x <- read.csv('test.csv',colClasses=c('character','integer','integer'))
x
                 a b c
1 1001002003003004 2 3


mode(x$a)
[1] "character"

mode(x$b)
[1] "numeric"


标签: r numeric