Subset based on first three numbers

2019-07-04 11:54发布

问题:

I have a very large data set of variables and I need to subset based on the first three numbers of the zip code. I'm not sure how to do this and would appreciate any help you can provide.

How would I subset this example dput to remove all those zip codes that start with 721. Note that I can't simple do a greater than (>) since there are zip codes large than 721 Thanks!

dput :

data <- structure(list(state = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
  1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("AR", 
  "IL", "MO"), class = "factor"), zip = c(72003L, 72042L, 72073L, 
  72166L, 72038L, 72055L, 72160L, 72026L, 72048L, 72140L, 72003L, 
  72042L, 72073L, 72166L, 72038L, 72055L, 72160L, 72026L, 72048L, 
  72140L)), .Names = c("state", "zip"), row.names = c(NA, 20L), class = "data.frame")

Data :

   state   zip
1     AR 72003
2     AR 72042
3     AR 72073
4     AR 72166
5     AR 72038
6     AR 72055
7     AR 72160
8     AR 72026
9     AR 72048
10    AR 72140
11    AR 72003
12    AR 72042
13    AR 72073
14    AR 72166
15    AR 72038
16    AR 72055
17    AR 72160
18    AR 72026
19    AR 72048
20    AR 72140

回答1:

You can try substr

data[substr(data$zip, 1,3)!=721,]

Or using data.table

library(data.table)
setDT(data)[substr(zip,1,3)!=721]

Or dplyr

library(dplyr)
data %>% 
      filter(substr(zip, 1,3)!=721)

Or using extract from tidyr

library(tidyr)
extract(data, zip, 'zip1', '(...).*', FALSE) %>% 
                              filter(zip1!=721) %>% 
                              select(-zip1)


标签: r subset