I am expecting to generate a lot of data and then catch it R. How can I estimate the size of the data.frame (and thus memory needed) by the number of rows, number of columns and variable types?
Example.
If I have 10000 rows and 150 columns out of which 120 are numeric, 20 are strings and 10 are factor level, what is the size of the data frame I can expect? Will the results change depending on the data stored in the columns (as in max(nchar(column))
)?
> m <- matrix(1,nrow=1e5,ncol=150)
> m <- as.data.frame(m)
> object.size(m)
120009920 bytes
> a=object.size(m)/(nrow(m)*ncol(m))
> a
8.00066133333333 bytes
> m[,1:150] <- sapply(m[,1:150],as.character)
> b=object.size(m)/(nrow(m)*ncol(m))
> b
4.00098133333333 bytes
> m[,1:150] <- sapply(m[,1:150],as.factor)
> c=object.size(m)/(nrow(m)*ncol(m))
> c
4.00098133333333 bytes
> m <- matrix("ajayajay",nrow=1e5,ncol=150)
>
> m <- as.data.frame(m)
> object.size(m)
60047120 bytes
> d=object.size(m)/(nrow(m)*ncol(m))
> d
4.00314133333333 bytes