可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I could solve this using loops, but I am trying think in vectors so my code will be more R-esque.
I have a list of names. The format is firstname_lastname. I want to get out of this list a separate list with only the first names. I can't seem to get my mind around how to do this. Here's some example data:
t <- c("bob_smith","mary_jane","jose_chung","michael_marx","charlie_ivan")
tsplit <- strsplit(t,"_")
which looks like this:
> tsplit
[[1]]
[1] "bob" "smith"
[[2]]
[1] "mary" "jane"
[[3]]
[1] "jose" "chung"
[[4]]
[1] "michael" "marx"
[[5]]
[1] "charlie" "ivan"
I could get out what I want using loops like this:
for (i in 1:length(tsplit)){
if (i==1) {t_out <- tsplit[[i]][1]} else{t_out <- append(t_out, tsplit[[i]][1])}
}
which would give me this:
t_out
[1] "bob" "mary" "jose" "michael" "charlie"
So how can I do this without loops?
回答1:
You can use apply
(or sapply
)
t <- c("bob_smith","mary_jane","jose_chung","michael_marx","charlie_ivan")
f <- function(s) strsplit(s, "_")[[1]][1]
sapply(t, f)
bob_smith mary_jane jose_chung michael_marx charlie_ivan
"bob" "mary" "jose" "michael" "charlie"
See: A brief introduction to “apply” in R
回答2:
And one more approach:
t <- c("bob_smith","mary_jane","jose_chung","michael_marx","charlie_ivan")
pieces <- strsplit(t,"_")
sapply(pieces, "[", 1)
In words, the last line extracts the first element of each component of the list and then simplifies it into a vector.
How does this work? Well, you need to realise an alternative way of writing x[1]
is "["(x, 1)
, i.e. there is a function called [
that does subsetting. The sapply
call applies calls this function once for each element of the original list, passing in two arguments, the list element and 1.
The advantage of this approach over the others is that you can extract multiple elements from the list without having to recompute the splits. For example, the last name would be sapply(pieces, "[", 2)
. Once you get used to this idiom, it's pretty easy to read.
回答3:
How about:
tlist <- c("bob_smith","mary_jane","jose_chung","michael_marx","charlie_ivan")
fnames <- gsub("(_.*)$", "", tlist)
# _.* matches the underscore followed by a string of characters
# the $ anchors the search at the end of the input string
# so, underscore followed by a string of characters followed by the end of the input string
for the RegEx approach?
回答4:
what about:
t <- c("bob_smith","mary_jane","jose_chung","michael_marx","charlie_ivan")
sub("_.*", "", t)
回答5:
I doubt this is the most elegant solution, but it beats looping:
t.df <- data.frame(tsplit)
t.df[1, ]
Converting lists to data frames is about the only way I can get them to do what I want. I'm looking forward to reading answers by people who actually understand how to handle lists.
回答6:
You almost had it. It really is just a matter of
- using one of the
*apply
functions to loop over your existing list, I often start with lapply
and sometimes switch to sapply
- add an anonymous function that operates on one of the list elements at a time
- you already knew it was
strsplit(string, splitterm)
and that you need the odd [[1]][1]
to pick off the first term of the answer
- just put it all together, starting with a preferred variable namne (as we stay clear of
t
or c
and friends)
which gives
> tlist <- c("bob_smith","mary_jane","jose_chung","michael_marx","charlie_ivan")
> fnames <- sapply(tlist, function(x) strsplit(x, "_")[[1]][1])
> fnames
bob_smith mary_jane jose_chung michael_marx charlie_ivan
"bob" "mary" "jose" "michael" "charlie"
>
回答7:
You could use unlist()
:
> tsplit <- unlist(strsplit(t,"_"))
> tsplit
[1] "bob" "smith" "mary" "jane" "jose" "chung" "michael"
[8] "marx" "charlie" "ivan"
> t_out <- tsplit[seq(1, length(tsplit), by = 2)]
> t_out
[1] "bob" "mary" "jose" "michael" "charlie"
There might be a better way to pull out only the odd-indexed entries, but in any case you won't have a loop.
回答8:
And one other approach, based on brentonk's unlist example...
tlist <- c("bob_smith","mary_jane","jose_chung","michael_marx","charlie_ivan")
tsplit <- unlist(strsplit(tlist,"_"))
fnames <- tsplit[seq(1:length(tsplit))%%2 == 1]
回答9:
I would use the following unlist()-based method:
> t <- c("bob_smith","mary_jane","jose_chung","michael_marx","charlie_ivan")
> tsplit <- strsplit(t,"_")
>
> x <- matrix(unlist(tsplit), 2)
> x[1,]
[1] "bob" "mary" "jose" "michael" "charlie"
The big advantage of this method is that it solves the equivalent problem for surnames at the same time:
> x[2,]
[1] "smith" "jane" "chung" "marx" "ivan"
The downside is that you'll need to be certain that all of the names conform to the firstname_lastname
structure; if any don't then this method will break.
回答10:
from the original tsplit
list object given at the beginning, this command will do:
unlist(lapply(tsplit,function(x) x[1]))
it extracts the first element of all list elements, then transforms a list to a vector. Unlisting first to a matrix, then extracting the fist column is also ok, but then you are dependent on the fact that all list elements have the same length. Here is the output:
> tsplit
[[1]]
[1] "bob" "smith"
[[2]]
[1] "mary" "jane"
[[3]]
[1] "jose" "chung"
[[4]]
[1] "michael" "marx"
[[5]]
[1] "charlie" "ivan"
> lapply(tsplit,function(x) x[1])
[[1]]
[1] "bob"
[[2]]
[1] "mary"
[[3]]
[1] "jose"
[[4]]
[1] "michael"
[[5]]
[1] "charlie"
> unlist(lapply(tsplit,function(x) x[1]))
[1] "bob" "mary" "jose" "michael" "charlie"