I want to reshape my dataframe from long to wide format and I loose some data that I'd like to keep. For the following example:
df <- data.frame(Par1 = unlist(strsplit("AABBCCC","")),
Par2 = unlist(strsplit("DDEEFFF","")),
ParD = unlist(strsplit("foo,bar,baz,qux,bla,xyz,meh",",")),
Type = unlist(strsplit("pre,post,pre,post,pre,post,post",",")),
Val = c(10,20,30,40,50,60,70))
# Par1 Par2 ParD Type Val
# 1 A D foo pre 10
# 2 A D bar post 20
# 3 B E baz pre 30
# 4 B E qux post 40
# 5 C F bla pre 50
# 6 C F xyz post 60
# 7 C F meh post 70
dfw <- dcast(df,
formula = Par1 + Par2 ~ Type,
value.var = "Val",
fun.aggregate = mean)
# Par1 Par2 post pre
# 1 A D 20 10
# 2 B E 40 30
# 3 C F 65 50
this is almost what I need but I would like to have
- some field keeping data from
ParD
field (for example, as single merged string), - number of observations used for aggregation.
i.e. I would like the resulting data.frame to be as follows:
# Par1 Par2 post pre Num.pre Num.post ParD
# 1 A D 20 10 1 1 foo_bar
# 2 B E 40 30 1 1 baz_qux
# 3 C F 65 50 1 2 bla_xyz_meh
I would be grateful for any ideas. For example, I tried to solve the second task by writing in dcast: fun.aggregate=function(x) c(Val=mean(x),Num=length(x))
- but this causes an error.
Late to the party, but here's another alternative using
data.table
:[from Matthew] +1 Some minor improvements to save repeating the same
==
, and to demonstrate local variables insidej
.And if a
list
column is ok rather than apaste
, then this should be faster :I believe this base R solution is comparable with @Arun's data table solution. (Which isn't to say I would prefer it; that code is much simpler!)
Using @RicardoSaporta's timing code with 900, 2700, and 10,800, respectively: