I am using Sparklyr
for a project and have understood that persisting is very useful. I am using sdf_persist
for this, with the following syntax (correct me if I am wrong):
data_frame <- sdf_persist(data_frame)
Now I am reaching a point where I have too many RDDs stored in memory, so I need to unpersist some. However I cannot seem to find the function to do this in Sparklyr
. Note that I have tried:
dplyr::db_drop_table(sc, "data_frame")
dplyr::db_drop_table(sc, data_frame)
unpersist(data_frame)
sdf_unpersist(data_frame)
But none of those work.
Also, I am trying to avoid using tbl_cache
(in which case it seems that db_drop_table
works) as it seems that sdf_persist
offers more liberty on the storage level. It might be that I am missing the big picture of how to use persistence here, in which case, I'll be happy to learn more.
If you don't care about granularity then the simplest solution is to invoke
Catalog.clearCache
:Uncaching specific object is much less straightforward due to
sparklyr
indirection. If you check the object returned bysdf_cache
you'll see that the persisted table is not exposed directly:That's beacuase you don't get registered table directly, but rather a result of subquery like
SELECT * FROM ...
.It means you cannot simply call
unpersist
:as you would in one of the official API's.
Instead you can try to retrieve the name of the source table, for example like this
and then invoke
Catalog.uncacheTable
:That is likely not the most robust solution, so please use with caution.