我的文件就像这 -
Pcol Mcol
P1 M1,M2,M5,M6,M1,M2,M1.M5
P2 M1,M2,M3,M5,M1,M2,M1,M3
P3 M4,M5,M7,M6,M5,M7,M4,M7
我想find all the combination of Mcol elements
,并find these combinatinatons are present in how many rows
。
预计输出 -
Mcol freq
M1,M2 2
M1,M5 2
M1,M6 1
M2,M5 2
M2,M6 1
M5,M6 2
M1,M3 1
M2,M3 1
M4,M5 1
M4,M7 1
M4,M6 1
M7,M6 1
我已经试过这 -
x <- read.csv("file.csv" ,header = TRUE, stringsAsFactors = FALSE)
xx <- do.call(rbind.data.frame,
lapply(x$Mcol, function(i){
n <- sort(unlist(strsplit(i, ",")))
t(combn(n, 2))
}))
data.frame(table(paste(xx[, 1], xx[, 2], sep = ",")))
它没有给出预期输出
我自己也尝试以此为良好
library(tidyverse)
df1 %>%
separate_rows(Mcol) %>%
group_by(Pcol) %>%
summarise(Mcol = list(combn(Mcol, 2, FUN= toString, simplify = FALSE))) %>%
unnest %>%
unnest %>%
count(Mcol)
但它是不给存在于行数组合的频率。 I want the frequency of row in which these combinations are present
。 这意味着if M1,M2 are present in P1 and P2 so it will calculate the frequency as 2
。