Subseting dataframe with multiple conditions

2019-09-17 10:06发布

问题:

Say I have a dataframe ARAP with columns called CoCd and VendorNo. I want to subset into another dataframe called EMIU_EMIJ all lines for combinations of:

CoCd="EMIJ" & VendorNo = "100010" or
CoCd="EMIU" & VendorNo = "2000001" or
CoCd="EMIU" & VendorNo = "2000006".

How do I combine & and | to select the lines where both combinations are met ? I.e. it needs to pair the CoCd and VendorNo combinations together.

I tried

EMIU_EMIJ<-subset(ARAP,CoCd=="EMIJ"&VendorNo=="100010"|
CoCd=="EMIU"&VendorNo=="2000001"|
CoCd=="EMIU"&VendorNo=="2000006")

I also tried brackets

EMIU_EMIJ<-subset(ARAP, (CoCd=="EMIJ"&VendorNo=="100010")|(CoCd=="EMIU"&VendorNo=="2000001")|(CoCd=="EMIU"&VendorNo=="2000006"))

But this created an error:"Error: unexpected symbol in:"EMIU_EMIJ"

How do I subset for 1 of the 3 combinations mentioned above ?

回答1:

a simple merge with all.y option will do.

for example if mydf is your data

set.seed(111)
mydf <- data.frame(id=rep(LETTERS, each=4)[1:100], replicate(3, sample(1001, 100)),Class=sample(c("Yes", "No"), 100, TRUE))
mydf$CoCd <- paste0("EMI",mydf$id)
mydf$VendorNo <- paste0(mydf$X1,mydf$X2)
mydf <- unique(mydf[,c("CoCd","VendorNo","Class","X3")])

and looks like this

    CoCd VendorNo Class   X3
1   EMIA   594577   Yes  727
2   EMIA   727137   Yes  921
3   EMIA   371939   Yes  123
4   EMIA   514176    No  950
5   EMIB   377818   Yes  668
6   EMIB    41713    No   85
7   EMIB    11637    No  579
8   EMIB   530266    No  212
9   EMIC   430566   Yes  241
10  EMIC    93958    No  533
11  EMIC   551197   Yes  176
12  EMIC   585686    No  565
13  EMID    67827   Yes  154
14  EMID    47894    No  469
15  EMID   155952    No  718
16  EMID   441649    No  835
17  EMIE   169541   Yes  945
18  EMIE   952871   Yes  452
19  EMIE   306441    No  358
20  EMIE   604730    No  920
21  EMIF   423407    No  868
22  EMIF   280668   Yes  658
23  EMIF   335907   Yes  830
24  EMIF   379620   Yes  841
25  EMIG   946644    No  471

and you want the combinations

combination_to_select<-data.frame(CoCd=c("EMIA","EMID","EMIF"),VendorNo=c('594577','47894','423407'),stringsAsFactors=FALSE)
combination_to_select

  CoCd VendorNo
1 EMIA   594577
2 EMID    47894
3 EMIF   423407

the following code gives you the subset

subset <- merge(mydf,combination_to_select,by=c("CoCd","VendorNo"),all.y=TRUE)
  CoCd VendorNo Class  X3
1 EMIA   594577   Yes 727
2 EMID    47894    No 469
3 EMIF   423407    No 868