I want to create CSV file. While running following Spark R code it gives an error.
sc <- spark_connect(master = "local", config = conf, version = '2.2.0')
sample_tbl <- spark_read_json(sc,name="example",path="example.json", header = TRUE, memory = FALSE,
overwrite = TRUE)
sdf_schema_viewer(sample_tbl) # to create db schema
df <- spark_dataframe(sample_tbl)
spark_write_table(df, path = "data.csv", header = TRUE, delimiter = ",",
charset = "UTF-8", null_value = NULL,
options = list(), mode = NULL, partition_by = NULL)
Last line gives following Error,
Error in spark_expect_jobj_class(x, "org.apache.spark.sql.DataFrame") :
This operation is only supported on org.apache.spark.sql.DataFrame jobjs but found org.apache.spark.sql.Dataset instead.
Question
How to resolve this error in R?
spark_dataframe
isIn other words it is used to expose internal JVM representation to be able to interact with Scala / Java API. It has no use here.
When working with
sdf_*
orspark_methods
you should passtbl_spark
objects. As long assample_tbl
contains only atomic types all you need is:Otherwise you have to restructure it (by expanding or exploding complex fields) or convert nested structs to serialized objects (for example with
to_json
).