I have 2 data frames df1
and df2
.
df1 <- data.frame(c1=c("a","b","c","d"),c2=c(1,2,3,4) )
df2 <- data.frame(c1=c("c","d","e","f"),c2=c(3,4,5,6) )
> df1
c1 c2
1 a 1
2 b 2
3 c 3
4 d 4
> df2
c1 c2
1 c 3
2 d 4
3 e 5
4 f 6
I need to perform set operation of these 2 data frames. I used merge(df1,df2,all=TRUE)
and merge(df1,df2,all=FALSE)
method to get the union and intersection of these data frames and got the required output. What is the function to get the minus of these data frames,that is all the positions existing on one data frame but not the other? I need the following output.
c1 c2
1 a 1
2 b 2
One issue with https://stackoverflow.com/a/16144262/2055486 is it assumes neither data frame already has duplicated rows. The following function removes that limitation and also works with arbitrary user defined columns in x or y.
The implementation uses a similar idea to the implementation of
duplicated.data.frame
in concatenating the columns together with a separator.duplicated.data.frame
uses"\r"
, which can cause collisions if the entries have embedded"\r"
characters. This uses the ASCII record separator"\30"
which will have a much lower chance of appearing in input data.I remember coming across this exact issue quite a few months back. Managed to sift through my Evernote one-liners.
Note: This is not my solution. Credit goes to whoever wrote it (whom I can't seem to find at the moment).
If you don't worry about
rownames
then you can do:Edit: A
data.table
solution:or better one-liner (from v1.9.6+):
This returns all rows in
df1
wheredf2$c1
doesn't have a match withdf1$c1
.You can create identifier columnas then subset:
e.g.
Then subset how you wish:
If you're not planning on using any of the actual data in
d2
, then you don't needmerge
at all:You can check the values in both columns and subset like this (just adding another solution):
I think the simplest solution is with dplyr (tidyverse).