How to subset a dataframe from another dataframe w

2020-04-20 06:58发布

问题:

How to subset a dataframe from another dataframe of different length ? eg: I have two dataframes, df1 and df2, how do I subset df1 using df2's Year, Month, Day, Hour so it will become something like the Expected Outcome?

Where Year, Month, Day, Hour from row 4, 6, 7 in df1 matches df2's row 1, 2, 3 so only row 4, 6, 7 in df1 is in the expected outcome.

df1

    V1  Year Month Day Hour Min Sec   Weight
1  1640 1999    02  05   04  00  00 1.936074
2  1519 1999    02  10   12  00  00 1.944277
3  1219 1999    02  25   16  00  00 1.947789
4  1720 1999    03  11   16  00  00 1.946841
5  1782 1999    03  18   08  00  00 1.956711
6  1523 1999    03  24   12  00  00 1.965768
7  1153 1999    04  01   16  00  00 1.981121
8  1262 1999    04  08   16  00  00 1.987066
9  1860 1999    04  15   00  00  00 1.982274
10 1624 1999    04  28   08  00  00 1.999045

df2

    V1  Year Month Day Hour Min Sec   Weight
1  3587 1999    03  11   16  00  00 2.836074
2  4675 1999    03  24   12  00  00 2.854277
3  3592 1999    04  01   16  00  00 2.917789
4  2980 1999    04  12   16  00  00 2.926841
5  2857 1999    04  18   16  00  00 2.986711

Expected Outcome

    V1  Year Month Day Hour Min Sec   Weight
4  1720 1999    03  11   16  00  00 1.946841
6  1523 1999    03  24   12  00  00 1.965768
7  1153 1999    04  01   16  00  00 1.981121

回答1:

You can use the semi_join function from dplyr:

library(dplyr)

semi_join(df1, df2, by = c("Year", "Month", "Day", "Hour"))

This will return only the rows in df1 that are matched in df2 by the Year, Month, Day and Hour columns. Unlike some of the other joins, it is not mutating, meaning it won't add the columns of df2 to the result.