R ddply with multiple variables

2019-01-25 22:51发布

问题:

Here is a simple data frame for my real data set:

df <- data.frame(ID=rep(101:102,each=9),phase=rep(1:3,6),variable=rep(LETTERS[1:3],each=3,times=2),mm1=c(1:18),mm2=c(19:36),mm3=c(37:54))

I would like to first group by ID and variable, then for values(mm1, mm2, mm3), phase 3 is subtracted from all phases(phase1 to phase3), which would make mm(1-3) in phase 1 all -2, in phase 2 all -1, and phase 3 all 0.

R throws an error of "Error in Ops.data.frame(x, x[3, ]) : - only defined for equally-sized data frames" as I tried:

df1 <- ddply(df, .(ID, variable), function(x) (x - x[3,]))   

Any advice would be greatly appreciated. The output should be look like this:

ID phase variable mm1 mm2 mm3
101  1      A     -2  -2  -2
101  2      A     -1  -1  -1
101  3      A      0   0   0
101  1      B     -2  -2  -2
101  2      B     -1  -1  -1
101  3      B      0   0   0
101  1      C     -2  -2  -2
101  2      C     -1  -1  -1
101  3      C      0   0   0
102  1      A     -2  -2  -2
102  2      A     -1  -1  -1
102  3      A      0   0   0
102  1      B     -2  -2  -2
102  2      B     -1  -1  -1
102  3      B      0   0   0
102  1      C     -2  -2  -2
102  2      C     -1  -1  -1
102  3      C      0   0   0

回答1:

Okay, took me a little bit to figure out what you want, but here is a solution:

cols.to.sub <- paste0("mm", 1:3)
df1 <- ddply(
  df, .(ID, variable), 
  function(x) {
    x[cols.to.sub] <- t(t(as.matrix(x[cols.to.sub])) - unlist(x[x$phase == 3, cols.to.sub]))
    x
} ) 

This produces (first 6 rows):

    ID phase variable mm1 mm2 mm3
1  101     1        A  -2  -2  -2
2  101     2        A  -1  -1  -1
3  101     3        A   0   0   0
4  101     1        B  -2  -2  -2
5  101     2        B  -1  -1  -1
6  101     3        B   0   0   0

Generally speaking the best way to debug this type of issue is to put a browser() statement inside the function you are passing to ddply, so you can examine the objects at your leisure. Doing so would have revealed that:

  1. The data frame passed to your function includes the ID columns, as well as the phase columns, so your mm columns are not the first three (hence the need to define cols.to.sub)
  2. Even if you address that, you can't operate on data frames that have unequal dimensions, so what I do here is convert to matrix, and then take advantage of vector recycling to subtract the one row from the rest of the matrix. I need to t (transpose) because vector recycling is column-wise.


标签: r plyr