Extracting characters from entries in a vector in

2019-02-26 04:51发布

问题:

There are functions in Excel called left, right, and mid, where you can extract part of the entry from a cell. For example, =left(A1, 3), would return the 3 left most characters in cell A1, and =mid(A1, 3, 4) would start with the the third character in cell A1 and give you characters number 3 - 6. Are there similar functions in R or similarly straightforward ways to do this?

As a simplified sample problem I would like to take a vector

sample<-c("TRIBAL","TRISTO", "RHOSTO", "EUGFRI", "BYRRAT")

and create 3 new vectors that contain the first 3 characters in each entry, the middle 2 characters in each entry, and the last 4 characters in each entry.

A slightly more complicated question that Excel doesn't have a function for (that I know of) would be how to create a new vector with the 1st, 3rd, and 5th characters from each entry.

回答1:

You are looking for the function substr or its close relative substring:

The leading characters are straight-forward:

substr(sample, 1, 3)
[1] "TRI" "TRI" "RHO" "EUG" "BYR"

So is extracting some characters at a defined position:

substr(sample, 2, 3)
[1] "RI" "RI" "HO" "UG" "YR"

To get the trailing characters, you have two options:

substr(sample, nchar(sample)-3, nchar(sample))
[1] "IBAL" "ISTO" "OSTO" "GFRI" "RRAT"

substring(sample, nchar(sample)-3)
[1] "IBAL" "ISTO" "OSTO" "GFRI" "RRAT"

And your final "complicated" question:

characters <- function(x, pos){
  sapply(x, function(x)
    paste(sapply(pos, function(i)substr(x, i, i)), collapse=""))
}
characters(sample, c(1,3,5))
TRIBAL TRISTO RHOSTO EUGFRI BYRRAT 
 "TIA"  "TIT"  "ROT"  "EGR"  "BRA" 


标签: r extract