I would appreciate insight into why this happens and how I might do this more eloquently.
When I use sapply, I would like it to return a 3x2 matrix, but it returns a 2x3 matrix. Why is this? And why is it difficult to attach this to another data frame?
a <- data.frame(id=c('a','b','c'), var1 = c(1,2,3), var2 = c(3,2,1))
out <- sapply(a$id, function(x) out = a[x, c('var1', 'var2')])
#out is 3x2, but I would like it to be 2x3
#I then want to append t(out) (out as a 2x3 matrix) to b, a 1x3 dataframe
b <- data.frame(var3=c(0,0,0))
when I try to attach these,
b[,c('col2','col3')] <- t(out)
The error that I get is:
Warning message:
In `[<-.data.frame`(`*tmp*`, , c("col2", "col3"), value = list(1, :
provided 6 variables to replace 2 variables
although the following appears to give the desired result:
rownames(out) <- c('col1', 'col2')
b <- cbind(b, t(out))
I can not operate on the variables:
b$var1/b$var2
returns
Error in b$var1/b$var2 : non-numeric argument to binary operator
Thanks!
To expand on DWin's answer: it would help to look at the structure of your out
object. It explains why b$var1/b$var2
doesn't do what you expect.
> out <- sapply(a$id, function(x) out = a[x, c('var1', 'var2')])
> str(out) # this isn't a data.frame or a matrix...
List of 6
$ : num 1
$ : num 3
$ : num 2
$ : num 2
$ : num 3
$ : num 1
- attr(*, "dim")= int [1:2] 2 3
- attr(*, "dimnames")=List of 2
..$ : chr [1:2] "var1" "var2"
..$ : NULL
The apply
family of functions are designed to work on vectors and arrays, so you need to take care when using them with data.frames (which are usually lists of vectors). You can use the fact that data.frames are lists to your advantage with lapply
.
> out <- lapply(a$id, function(x) a[x, c('var1', 'var2')]) # list of data.frames
> out <- do.call(rbind, out) # data.frame
> b <- cbind(b,out)
> str(b)
'data.frame': 3 obs. of 4 variables:
$ var3: num 0 0 0
$ var1: num 1 2 3
$ var2: num 3 2 1
$ var3: num 0 0 0
> b$var1/b$var2
[1] 0.3333333 1.0000000 3.0000000
First a bit of R notation. The If you look at the code for sapply
, you will find the answer to your question. The sapply
function checks to see if the list lengths are all equal, and if so, it first "unlist()"s them and then takes that series of lists as the data argument to array()
. Since array
(like matrix() ) by default arranges its values in column major order, that is what you get. The lists get turned on their side. If you don't like it then you can define a new function tsapply
that will return the transposed values:
> tsapply <- function(...) t(sapply(...))
> out <- tsapply(a$id, function(x) out = a[x, c('var1', 'var2')])
> out
var1 var2
[1,] 1 3
[2,] 2 2
[3,] 3 1
... a 3 x 2 matrix.
Have a look at ddply from the plyr package
a <- data.frame(id=c('a','b','c'), var1 = c(1,2,3), var2 = c(3,2,1))
library(plyr)
ddply(a, "id", function(x){
out <- cbind(O1 = rnorm(nrow(x), x$var1), O2 = runif(nrow(x)))
out
})