可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

This question already has an answer here:

pair-wise duplicate removal from dataframe [duplicate] 4 answers

The data I have is something like that:

RES1 <- c("A","B","A","A","B")
RES2 <- c("B","A","A","B","A")
VAL1 <-c(3,5,3,6,8)
VAL2 <- c(5,3,7,2,7)
dff <- data.frame(RES1,VAL1,RES2,VAL2)
dff
  RES1 VAL1 RES2 VAL2
  1    A    3    B    5 
  2    B    5    A    3
  3    A    3    A    7
  4    A    6    B    2
  5    B    8    A    7

I want to remove the lines where I already have the same res1-res2 pair. For example: A 3 interacts with B 5. That's the information I want. I do not care which pair is first. B 5 with A 3 or A 3 with B 5. What I want to get is the following dataframe:

output
  RES1 VAL1 RES2 VAL2
   1    A    3    B    5
   2    A    3    A    7
   3    A    6    B    2
   4    B    8    A    7

Then I want to do the same for another data frame such as :

RES3 <- c("B","B","B","A","B")
RES4 <- c("A","A","A","A","B")
VAL4 <- c(3,7,5,3,8)
VAL3 <- c(5,8,3,7,3)
df2 <- data.frame(RES3,VAL3,RES4,VAL4)

df2
  RES3 VAL3 RES4 VAL4
   1     B     5     A     3
   2     B     8     A     7
   3     B     3     A     5
   4     A     7     A     3
   5     B     3     B     8

At the end, I just want to keep mutual pairs (in my definition both pairs are the same, keeping one is essential : "A 5" - "B 3" is the same as "B 3" - "A 5". In other words, order does not matter.

Final output I desire should have the following pairs which are unique and which exist in BOTH dataframes:

mutualpairs
  RESA VALA RESB VALB
  A     3     B     5
  A     3     A     7
  B     8     A     7

回答1:

You can use this code:

dff[!duplicated(t(apply(cbind(paste(dff$RES1,dff$VAL1),paste(dff$RES2,dff$VAL2)),1,sort))),]

Equivalent unrolled code:

v1 <- paste(dff$RES1,dff$VAL1)
v2 <- paste(dff$RES2,dff$VAL2)
mx <- cbind(v1,v2)
mxSorted <- t(apply(mx,1,sort))
duped <- duplicated(mxSorted)
dff[!duped,]

Explanation:

1) we create two character vectors v1, v2 by concatenating columns RES1-VAL1 and RES2-VAL2 (note that paste uses a space as default separator, maybe you could use another character or string to be safer (e.g. |,@,; etc...)
Result:

> v1
[1] "A 3" "B 5" "A 3" "A 6" "B 8"
> v2
[1] "B 5" "A 3" "A 7" "B 2" "A 7"

2) we bind these two vectors to form a matrix using cbind;
Result:

     [,1]  [,2] 
[1,] "A 3" "B 5"
[2,] "B 5" "A 3"
[3,] "A 3" "A 7"
[4,] "A 6" "B 2"
[5,] "B 8" "A 7"

3) we sort the values of each row of the matrix using t(apply(mx,1,sort));
by sorting the rows, we simply make identical the rows having the same values just swapped (note that final transpose is necessary since apply function always returns results on the columns).
Result:

     [,1]  [,2] 
[1,] "A 3" "B 5"
[2,] "A 3" "B 5"
[3,] "A 3" "A 7"
[4,] "A 6" "B 2"
[5,] "A 7" "B 8"

4) calling duplicated on a matrix, we get a logical vector of length = nrow(matrix), being TRUE where a row is a duplicate of a previous row, so in our case, we get:

[1] FALSE  TRUE FALSE FALSE FALSE
# i.e. the second row is a duplicate

5) finally we use this vector to filter the rows of the data.frame, getting the final result:

  RES1 VAL1 RES2 VAL2
1    A    3    B    5
3    A    3    A    7
4    A    6    B    2
5    B    8    A    7

回答2:

Possible duplicate of Remove duplicates column combinations from a dataframe in R

Adapting the answer here:

dff[!duplicated(dff[c('RES1','RES2')]),]

How to remove duplicate rows in both using a condi

问题:

回答1:

回答2:

收藏的人(0)

How to remove duplicate rows in both using a condi

问题:

回答1:

回答2:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮