I have created a Spark dataframe called "iris" using the below
library(sparklyr)
library(dplyr)
sc <- spark_connect(master = "local")
iris_tbl <- copy_to(sc, iris)
now I want to delete the Spark dataframe "iris" (not the dataframe in R) how do I do that?
This strictly depends on what you when say delete dataframe. You have to remember that in general, Spark data frames are not the same type of objects as you plain local data structures. Spark DataFrame
is rather a description than a data container.
sparklyr
itself, depends primarily on Spark SQL interface. When you call copy_to
(or any other data import method, it:
- Registers a temporary table.
- With default parameters (which really should be avoided) it eagerly caches the tables.
This means that the natural way to delete dataframe is to drop the temporary view (referencing it by its name either with dplyr
/ dbplyr
:
db_drop_table(sc, "iris")
or Spark's own methods:
sc %>% spark_session() %>% invoke("catalog") %>% invoke("dropTempView", "iris")
Please note that it will invalidate local bindings, so any attempt to access iris_tbl
after calling any of the methods shown above will fail:
iris_tbl
Error: org.apache.spark.sql.AnalysisException: Table or view not found: iris; line 2 pos 5
...
Caused by: org.apache.spark.sql.catalyst.analysis.NoSuchTableException: Table or view 'iris' not found in database 'default'
...