Combinations by group in R

2019-01-19 19:50发布

问题:

I have a question about combinations by group.

My mini-sample looks like this:

sample <- data.frame(
  group=c("a","a","a","a","b","b","b"),
  number=c(1,2,3,2,4,5,3)
)

If I apply the function of combnto the data frame,it gives me following result, which is all the combinations of the values under the 'number' column regardless of which group the value belongs to:

         [,1] [,2]
   [1,]    1    2
   [2,]    1    3
   [3,]    1    2
   [4,]    1    4
   [5,]    1    5
   [6,]    1    3
   [7,]    2    3
   [8,]    2    2
   [9,]    2    4
  [10,]    2    5
  [11,]    2    3
  [12,]    3    2
  [13,]    3    4
  [14,]    3    5
  [15,]    3    3
  [16,]    2    4
  [17,]    2    5
  [18,]    2    3
  [19,]    4    5
  [20,]    4    3
  [21,]    5    3

The code that I used for the results above is as follows:

t(combn((sample$number), 2))

However, I would like to get the combination results within the group (i.e., "a", "b"). Therefore, the result that I want to get should look like this:

     [,1] [,2] [,3]
[1,]   a    1    2
[2,]   a    1    3
[3,]   a    1    2
[4,]   a    2    3
[5,]   a    2    2
[6,]   a    3    2
[7,]   b    4    5
[8,]   b    4    3
[9,]   b    5    3

In addition to the combinations, I would like to get the column indicating the group.

回答1:

We can use a group by function with data.table

library(data.table)
setDT(sample)[, {i1 <-  combn(number, 2)
                   list(i1[1,], i1[2,]) }, by =  group]
#    group V1 V2
#1:     a  1  2
#2:     a  1  3
#3:     a  1  2
#4:     a  2  3
#5:     a  2  2
#6:     a  3  2
#7:     b  4  5
#8:     b  4  3
#9:     b  5  3

Or a compact option would be

setDT(sample)[, transpose(combn(number, 2, FUN = list)), by = group]

Or using base R

 lst <- by(sample$number, sample$group, FUN = combn, m= 2)
 data.frame(group = rep(unique(as.character(sample$group)), 
                        sapply(lst, ncol)), t(do.call(cbind, lst)))


回答2:

Here's a base R option using (1) split to create a list of data.frames per unique group-entry, (2) lapply to loop over each list element and compute the combinations using combn, (3) do.call(rbind, ...) to collect the list elements back into a single data.frame.

do.call(rbind, lapply(split(sample, sample$group), {
   function(x) data.frame(group = x$group[1], t(combn(x$number, 2)))
}))

#    group X1 X2
#a.1     a  1  2
#a.2     a  1  3
#a.3     a  1  2
#a.4     a  2  3
#a.5     a  2  2
#a.6     a  3  2
#b.1     b  4  5
#b.2     b  4  3
#b.3     b  5  3

And a dplyr option:

library(dplyr)
sample %>% group_by(group) %>% do(data.frame(t(combn(.$number, 2))))
#Source: local data frame [9 x 3]
#Groups: group [2]
#
#   group    X1    X2
#  (fctr) (dbl) (dbl)
#1      a     1     2
#2      a     1     3
#3      a     1     2
#4      a     2     3
#5      a     2     2
#6      a     3     2
#7      b     4     5
#8      b     4     3
#9      b     5     3