I have few huge datatable dt_1, dt_2, ..., dt_N
with same cols. I want to bind them together into a single datatable
. If I use
dt <- rbind(dt_1, dt_2, ..., dt_N)
or
dt <- rbindlist(list(dt_1, dt_2, ..., dt_N))
then the memory usage is approximately double the amount needed for dt_1,dt_2,...,dt_N
. Is there a way to bind them wihout increasing the memory consumption significantly? Note that I do not need dt_1, dt_2, ..., dt_N
once they are combined together.
Other approach, using a temporary file to 'bind':
Obviously slower than the
rbind
method, but if you have memory contention, this won't be slower than requiring the system to swap out memory pages.Of course if your orignal objects are loaded from file at first, prefer concatenating the files before loading in R with another tool most aimed at working with files (cat, awk, etc.)
You can remove your datatables after you've bound them, the double memory-usage is caused by the new dataframe consisting of copies.
Illustration:
Then we can look at memory-usage per object source
If the memory-usage is so large the separate datatables and combined datatable cannot coexist, we can (shocking, but IMHO this case warrants it as there are a small number of datatables and it's easily readable and understandable) a for-loop and
get
to create our combined datatable and delete the individual ones at the same time:I guess
<<-
andget
can help you with this.UPDATE:
<<-
is not necessary.