R data.frame: rowSums of selected columns by group

2019-07-09 02:09发布

问题:

I have a data frame with a sequence of numeric columns, surrounded on both sides by (irrelevant) columns of characters. I want to obtain a new data frame that keeps the position of the irrelevant columns, and adds the numeric columns to eachother by a certain grouping vector (or applies some other row-wise function to the data frame, by group). Example:

sample = data.frame(cha1 = c("A","B"),num1=1:2,num2=3:4,num3=11:12,num4=13:14,cha2=c("C","D"))
> sample
  cha1 num1 num2 num3 num4 cha2
1    A    1    3   11   13    C
2    B    2    4   12   14    D

with the goal to obtain

> goal
  cha1 X1 X2 cha2 
1    A  4 24    C
2    B  6 26    D

i.e. I've summed the 4 numeric columns according to the grouping vector gl(2,2,4) = (1,1,2,2) [levels: 1,2]

For a purely numeric data frame I've found the following method:

sample_num = sample[,2:5] #select numeric columns
data.frame(t(apply(sample_num,1,function(row) tapply(row, INDEX=gl(2,2,4),sum))))

I could combine this with re-inserting the character columns to give the intended result, but I'm really looking for a more elegant way. I'm particularly interested in a plyr method if there is one, as I'm trying to migrate to plyr for all my data frame manipulations. I imagine the first step would be to cast the data frame into long format, but I have no idea how to proceed from there.

One 'absolute' requirement is that I cannot do without the gl(n,k,l) method of grouping, as I need this to be applicable to a wide range of data frames and grouping factors.

EDIT: for simplicity assume that I know which columns are the relevant numeric columns. I'm not concerned with how to select them, I'm concerned with how to do my grouped sum without messing up the original data frame structure.

Thanks!

回答1:

Grpindex<-gl(2,2,4)    
goal<-cbind.data.frame(sample["cha1"],(t(rowsum(t(sample[,2:5]), paste0("X",Grpindex)))),sample["cha2"])

Output:

  cha1 X1 X2 cha2
1    A  4 24    C
2    B  6 26    D