selecting rows in a data.frame in which a certain

2019-08-11 02:30发布

问题:

I have a data.frame of the type:

> head(engschools)
RECTYPE LEA ESTAB    URN                        SCHNAME               TOWN    PCODE       
1       1 919  2028 138231              Alban City School          n.a.       E1 3RR 
2       1 919  4003 138582           Samuel Ryder Academy          St Albans  AL1 5AR 
3       1 919  2004 138201 Hatfield Community Free School           Hatfield  AL10 8ES 
4       2 919  7012 117671               St Luke's School          n.a        BR3 7ET 
5       1 919  2018 138561          Harpenden Free School           Redbourn  AL3 7QA 
6       2 919  7023 117680                Lakeside School Welwyn Garden City  AL8 6YN 

And a set of prefixes like this one:

>head(prefixes)
E
AL

I would like to select the rows from the data.frame engschools that have values in column PCODE which contain one of the prefixes in prefixes. The correct result would thus contain rows 1:3 and 5:6 but not row 4.

回答1:

You can try something like this:

mydf[grep(paste0("^", prefixes, collapse="|"), engschools$PCODE), ]
#   RECTYPE LEA ESTAB    URN                        SCHNAME               TOWN    PCODE
# 1       1 919  2028 138231              Alban City School               n.a.   E1 3RR
# 2       1 919  4003 138582           Samuel Ryder Academy          St Albans  AL1 5AR
# 3       1 919  2004 138201 Hatfield Community Free School           Hatfield AL10 8ES
# 5       1 919  2018 138561          Harpenden Free School           Redbourn  AL3 7QA
# 6       2 919  7023 117680                Lakeside School Welwyn Garden City  AL8 6YN

Here, we have used:

  • paste to create our search pattern (in this case, "^E|^AL").
  • grep to identify the row indexes that match the provided pattern.
  • Basic [ style extracting to extract the relevant rows.