Why does sapply return a matrix that I need to tra

2019-02-16 19:53发布

I would appreciate insight into why this happens and how I might do this more eloquently.

When I use sapply, I would like it to return a 3x2 matrix, but it returns a 2x3 matrix. Why is this? And why is it difficult to attach this to another data frame?

a <- data.frame(id=c('a','b','c'), var1 = c(1,2,3), var2 = c(3,2,1))
out <- sapply(a$id, function(x) out = a[x, c('var1', 'var2')])
#out is 3x2, but I would like it to be 2x3
#I then want to append t(out) (out as a 2x3 matrix) to b, a 1x3 dataframe
b <- data.frame(var3=c(0,0,0))

when I try to attach these,

b[,c('col2','col3')] <- t(out)

The error that I get is:

Warning message:
In `[<-.data.frame`(`*tmp*`, , c("col2", "col3"), value = list(1,  :
  provided 6 variables to replace 2 variables

although the following appears to give the desired result:

rownames(out) <- c('col1', 'col2')
b <- cbind(b, t(out))

I can not operate on the variables:

b$var1/b$var2

returns

Error in b$var1/b$var2 : non-numeric argument to binary operator

Thanks!

3条回答
你好瞎i
2楼-- · 2019-02-16 20:26

To expand on DWin's answer: it would help to look at the structure of your out object. It explains why b$var1/b$var2 doesn't do what you expect.

> out <- sapply(a$id, function(x) out = a[x, c('var1', 'var2')])
> str(out)  # this isn't a data.frame or a matrix...
List of 6
 $ : num 1
 $ : num 3
 $ : num 2
 $ : num 2
 $ : num 3
 $ : num 1
 - attr(*, "dim")= int [1:2] 2 3
 - attr(*, "dimnames")=List of 2
  ..$ : chr [1:2] "var1" "var2"
  ..$ : NULL

The apply family of functions are designed to work on vectors and arrays, so you need to take care when using them with data.frames (which are usually lists of vectors). You can use the fact that data.frames are lists to your advantage with lapply.

> out <- lapply(a$id, function(x) a[x, c('var1', 'var2')])  # list of data.frames
> out <- do.call(rbind, out) # data.frame
> b <- cbind(b,out)
> str(b)
'data.frame':   3 obs. of  4 variables:
 $ var3: num  0 0 0
 $ var1: num  1 2 3
 $ var2: num  3 2 1
 $ var3: num  0 0 0
> b$var1/b$var2
[1] 0.3333333 1.0000000 3.0000000
查看更多
对你真心纯属浪费
3楼-- · 2019-02-16 20:27

First a bit of R notation. The If you look at the code for sapply, you will find the answer to your question. The sapply function checks to see if the list lengths are all equal, and if so, it first "unlist()"s them and then takes that series of lists as the data argument to array(). Since array (like matrix() ) by default arranges its values in column major order, that is what you get. The lists get turned on their side. If you don't like it then you can define a new function tsapply that will return the transposed values:

> tsapply <- function(...) t(sapply(...))
> out <- tsapply(a$id, function(x) out = a[x, c('var1', 'var2')])
> out
     var1 var2
[1,] 1    3   
[2,] 2    2   
[3,] 3    1 

... a 3 x 2 matrix.

查看更多
手持菜刀,她持情操
4楼-- · 2019-02-16 20:36

Have a look at ddply from the plyr package

a <- data.frame(id=c('a','b','c'), var1 = c(1,2,3), var2 = c(3,2,1))

library(plyr)
ddply(a, "id", function(x){
    out <- cbind(O1 = rnorm(nrow(x), x$var1), O2 = runif(nrow(x)))
    out
})
查看更多
登录 后发表回答