How to count unique combinations from a data table

2020-04-21 08:56发布

问题:

I have a data table with three columns. The first two are a collection of the data points (categorical data that can be either A, B, or C). The third column is a concatenation of the two data points as they belong to a set. I need to get the counts of the possible combinations and need to count "A & C" as part of the same group as "C & A" since the order of these pairs doesn't matter.

Previously, I set up a column that concatenated "CAT1 & CAT2" and then another as "CAT2 & CAT1" hoping to sum up there, but those numbers of course don't add up properly then.

library(data.table)
dt1 = data.table(CAT1 = c('a','b','c','b','a','c','c','b','a','c'),
                 CAT2 = c('a','b','c','a','b','c','a','b','c','a'))

dt1[,merged := paste(dt1$CAT1, dt1$CAT2, sep = ' & ')]

counts = data.table(table(dt1$merged))

The output table "counts" gives me all the uniques of the merged column, but I need to sum up anywhere the data points are flipped ("A & C" + "C & A"). I recognize that this could be done manually, but I have far too many actual data points to do by hand.

回答1:

dt1[,paste(sort(c(CAT1,CAT2)),collapse=" & "),by=1:nrow(dt1)][,table(V1)]


回答2:

You can also do something like this- Note- As mentioned by @chinsoon12, we can use pmin & pmax

 > setDT(dt1)[,list(Count=.N) ,paste(pmin(CAT1, CAT2), pmax(CAT1, CAT2), sep=' & ')]
   paste Count
1: a & a     1
2: b & b     2
3: c & c     2
4: a & b     2
5: a & c     3


回答3:

I'm no good with data.table, so here's my answer with a data.frame:

Just sort the two CATs before pasting, making sure they're always in the same order.

 dt1$merged<-apply(dt1,1,function(x) paste(sort(x),collapse=" & "))

I'm sure there's a faster way to do in with data.table, but I'm not sure how. A naive sort added to your code came up with an error...