This question already has an answer here:
The data I have is something like that:
RES1 <- c("A","B","A","A","B")
RES2 <- c("B","A","A","B","A")
VAL1 <-c(3,5,3,6,8)
VAL2 <- c(5,3,7,2,7)
dff <- data.frame(RES1,VAL1,RES2,VAL2)
dff
RES1 VAL1 RES2 VAL2
1 A 3 B 5
2 B 5 A 3
3 A 3 A 7
4 A 6 B 2
5 B 8 A 7
I want to remove the lines where I already have the same res1-res2 pair. For example: A 3 interacts with B 5. That's the information I want. I do not care which pair is first. B 5 with A 3 or A 3 with B 5. What I want to get is the following dataframe:
output
RES1 VAL1 RES2 VAL2
1 A 3 B 5
2 A 3 A 7
3 A 6 B 2
4 B 8 A 7
Then I want to do the same for another data frame such as :
RES3 <- c("B","B","B","A","B")
RES4 <- c("A","A","A","A","B")
VAL4 <- c(3,7,5,3,8)
VAL3 <- c(5,8,3,7,3)
df2 <- data.frame(RES3,VAL3,RES4,VAL4)
df2
RES3 VAL3 RES4 VAL4
1 B 5 A 3
2 B 8 A 7
3 B 3 A 5
4 A 7 A 3
5 B 3 B 8
At the end, I just want to keep mutual pairs (in my definition both pairs are the same, keeping one is essential : "A 5" - "B 3" is the same as "B 3" - "A 5". In other words, order does not matter.
Final output I desire should have the following pairs which are unique and which exist in BOTH dataframes:
mutualpairs
RESA VALA RESB VALB
A 3 B 5
A 3 A 7
B 8 A 7
You can use this code:
Equivalent unrolled code:
Explanation:
1) we create two character vectors
v1
,v2
by concatenating columns RES1-VAL1 and RES2-VAL2 (note thatpaste
uses a space as default separator, maybe you could use another character or string to be safer (e.g.|
,@
,;
etc...)Result:
2) we bind these two vectors to form a matrix using
cbind
;Result:
3) we sort the values of each row of the matrix using
t(apply(mx,1,sort))
;by sorting the rows, we simply make identical the rows having the same values just swapped (note that final transpose is necessary since
apply
function always returns results on the columns).Result:
4) calling
duplicated
on a matrix, we get a logical vector of length = nrow(matrix), being TRUE where a row is a duplicate of a previous row, so in our case, we get:5) finally we use this vector to filter the rows of the data.frame, getting the final result:
Possible duplicate of Remove duplicates column combinations from a dataframe in R
Adapting the answer here: