I am expecting to generate a lot of data and then catch it R. How can I estimate the size of the data.frame (and thus memory needed) by the number of rows, number of columns and variable types?
Example.
If I have 10000 rows and 150 columns out of which 120 are numeric, 20 are strings and 10 are factor level, what is the size of the data frame I can expect? Will the results change depending on the data stored in the columns (as in max(nchar(column))
)?
> m <- matrix(1,nrow=1e5,ncol=150)
> m <- as.data.frame(m)
> object.size(m)
120009920 bytes
> a=object.size(m)/(nrow(m)*ncol(m))
> a
8.00066133333333 bytes
> m[,1:150] <- sapply(m[,1:150],as.character)
> b=object.size(m)/(nrow(m)*ncol(m))
> b
4.00098133333333 bytes
> m[,1:150] <- sapply(m[,1:150],as.factor)
> c=object.size(m)/(nrow(m)*ncol(m))
> c
4.00098133333333 bytes
> m <- matrix("ajayajay",nrow=1e5,ncol=150)
>
> m <- as.data.frame(m)
> object.size(m)
60047120 bytes
> d=object.size(m)/(nrow(m)*ncol(m))
> d
4.00314133333333 bytes
Check out
pryr
package as well. It hasobject_size
which may be slightly better for you. From the advanced RYou also need to account for the size of
attributes
as well as the column types etc.You could create dummy variables that store examples of the data you will be storing in the dataframe.
Then use
object.size()
to find their size and multiply with the rows and columns accordingly.You can simulate an object and compute an estimation of the memory that is being used to store it as an R object using
object.size
: