This is probably simple to solve. I have a 2D matrix mat
with 500 rows × 335 columns, and a data.frame dat
with 120425 rows. The data.frame dat
has two columns I
and J
, which are integers to index the row, column from mat
. I would like to add the values from mat
to the rows of dat
.
Here is my conceptual fail:
> dat$matval <- mat[dat$I, dat$J]
Error: cannot allocate vector of length 1617278737
(I am using R 2.13.1 on Win32). Digging a bit deeper, I see that I'm misusing matrix indexing, as it appears that I'm only getting a sub-matrix of mat
, and not a single-dimension array of values as I expected, i.e.:
> str(mat[dat$I[1:100], dat$J[1:100]])
int [1:100, 1:100] 20 1 1 1 20 1 1 1 1 1 ...
I was expecting something like int [1:100] 20 1 1 1 20 1 1 1 1 1 ...
. What is the correct way to index a 2D matrix using indices of row, column to get the values?
Here's a one-liner using
apply
's row-based operationsAlmost. Needs to be offered to "[" as a two column matrix:
There is a caveat: Although this also works for dataframes, they are first coerced to matrix-class and if any are non-numeric, the entire matrix becomes the "lowest denominator" class.
Using a matrix to index as DWin suggests is of course much cleaner, but for some strange reason doing it manually using 1-D indices is actually slightly faster:
The
dat$I + (dat$J-1L)*nrow(m)
part turns the 2-D indices into 1-D ones. The1L
is the way to specify an integer instead of a double value. This avoids some coercions....I also tried gsk3's apply-based solution. It's almost 500x slower though: