I am using sparklyr to manipulate some data. Given a,
a<-tibble(id = rep(c(1,10), each = 10),
attribute1 = rep(c("This", "That", 'These', 'Those', "The", "Other", "Test", "End", "Start", 'Beginning'), 2),
value = rep(seq(10,100, by = 10),2),
average = rep(c(50,100),each = 10),
upper_bound = rep(c(80, 130), each =10),
lower_bound = rep(c(20, 70), each =10))
I would like use "gather" to manipulate the data, like this:
b<- a %>%
gather(key = type_data, value = value_data, -c(id:attribute1))
However, "gather" is not available on sparklyr. I have seen some people using sdf_pivot to mimic "gather" (eg How to use sdf_pivot() in sparklyr and concatenate strings?) but I can’t see how to use it in this case.
Does anyone have an idea?
Cheers!
You can design an equivalent using
map
/explode
:or Hive
stack
function:Both should give the same result:
and can be modified to support non-standard evaluation.
Please note that both methods require homogeneous column types.
Notes
explode
version generates following query:and optimized logical execution plan
while
stack
version generatesand
Single quoted values (i.e.
'value'
), in the generated SQL are literal strings, while backquoted values represent column reference.Here's a function to mimic
gather
in sparklyr. This would gather the given columns while keeping everything else intact, but it can easily be extended if required.Nope, no
pivot
answer here.I am also waiting for a better one.
Result