I have been quite interested lately with extremely large vectors/arrays in R. I know from reading the documentation that the size limit of an array is 231 - 1. However, using the CRAN package "bit", it is possible to instantiate a vector of booleans of length n while only using n/32 bits. I was wondering if it was possible to overcome R's internal limits by somehow implementing bit vectors (or some other way). It seems, to me, that it should be possible to have a bit vector with a limit of 32 times the maximum size (i.e. 32(231 - 1) = 25(231 - 1)). I have tried a variety of methods, however unsuccessfully. Below are some examples that R can handle:
library("bit")
a <- as.bit(rep(T, 10))
a
bit length=100 occupying only 4 integers
1 2 3 4 5 6 7 8 93 94 95 96 97 98 99 100
TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE .. TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
a <- as.bit(rep(T, 10^9))
Here are some examples that throw R into a panic:
b <- as.bit(rep(T, 10^10))
Error in rep(T, 10^10) : invalid 'times' argument
In addition: Warning message:
In as.bit(rep(T, 10^10)) : NAs introduced by coercion
a <- c(as.bit(rep(T, 10^9)), as.bit(rep(T, 10^9)), as.bit(rep(T, 10^9)), as.bit(rep(T, 10^9)), as.bit(rep(T, 10^9)))
Error in if (length%%.BITS) n <- length%/%.BITS + 1L else n <- length%/%.BITS :
argument is not interpretable as logical
In addition: Warning messages:
1: In sum(nold) : integer overflow - use sum(as.numeric(.))
2: In c.bit(as.bit(rep(T, 10^9)), as.bit(rep(T, 10^9)), as.bit(rep(T, :
integer overflow in 'cumsum'; use 'cumsum(as.numeric(.))'
Edit: After doing some research, I have found that the restriction on vector size is due to the restriction on integers. The maximum integer allowed by R is 231 - 1 and since integers are used for indexing vectors, we impose the same size limits on them. This still doesn't really answer my question. As the first example indicates above, I have a bit vector with 100 boolean items in it that is only taking up 4 integers worth of memory. Is there some sort of way of nesting indices? For example, to see the 83rd element in the 1st bit vector, could we do something like a[2][19] (2*32 + 19 = 83) for bit vectors?