How to delete a Spark DataFrame using sparklyr?

2019-09-02 06:34发布

问题:

I have created a Spark dataframe called "iris" using the below

library(sparklyr)
library(dplyr)
sc <- spark_connect(master = "local")
iris_tbl <- copy_to(sc, iris)

now I want to delete the Spark dataframe "iris" (not the dataframe in R) how do I do that?

回答1:

This strictly depends on what you when say delete dataframe. You have to remember that in general, Spark data frames are not the same type of objects as you plain local data structures. Spark DataFrame is rather a description than a data container.

sparklyr itself, depends primarily on Spark SQL interface. When you call copy_to (or any other data import method, it:

  • Registers a temporary table.
  • With default parameters (which really should be avoided) it eagerly caches the tables.

This means that the natural way to delete dataframe is to drop the temporary view (referencing it by its name either with dplyr / dbplyr:

db_drop_table(sc, "iris")

or Spark's own methods:

sc %>% spark_session() %>% invoke("catalog") %>% invoke("dropTempView", "iris")

Please note that it will invalidate local bindings, so any attempt to access iris_tbl after calling any of the methods shown above will fail:

iris_tbl 
Error: org.apache.spark.sql.AnalysisException: Table or view not found: iris; line 2 pos 5
...
Caused by: org.apache.spark.sql.catalyst.analysis.NoSuchTableException: Table or view 'iris' not found in database 'default'
...