I have referred to all the links mentioned here:
1) Link-1 2) Link-2 3) Link-3 4) Link-4
Following R code has been written by using Sparklyr Package. It reads huge JSON file and creates database schema.
sc <- spark_connect(master = "local", config = conf, version = '2.2.0') # Connection
sample_tbl <- spark_read_json(sc,name="example",path="example.json", header = TRUE,
memory = FALSE, overwrite = TRUE) # reads JSON file
sample_tbl <- sdf_schema_viewer(sample_tbl) # to create db schema
df <- tbl(sc,"example") # to create lookup table
It has created following database schema
Now,
If I rename first level column, then it works.
For example,
df %>% rename(ent = entities)
But when I run 2nd deep level nested column then it doesn't rename.
df %>% rename(e_hashtags = entities.hashtags)
It shows error:
Error in .f(.x[[i]], ...) : object 'entities.hashtags' not found
Question
My question is, how to rename 3rd to 4th deep level nested column also?
Please refer database schema mentioned above.
Spark as such doesn't support renaming individual nested fields. You have to either cast or rebuild a whole structure. For simplicity let's assume that data looks as follows:
with simple string representation:
With cast you have to define expression using matching type description:
To rebuild structure you have to match all components: