Consider the following example in R:
x1 <- rnorm(100000)
x2 <- rnorm(100000)
g <- cbind(x1, x2, x1^2, x2^2)
gg <- t(g) %*% g
gginv <- solve(gg)
bigmatrix <- outer(x1, x2, "<=")
Gw <- t(g) %*% bigmatrix
beta <- gginv %*% Gw
w1 <- bigmatrix - g %*% beta
If I try to run such a thing in my computer, it will throw a memory error (because the bigmatrix
is too big).
Do you know how can I achieve the same, without running into this problem?
This is a least squares problem with 100,000 responses. Your
bigmatrix
is the response (matrix),beta
is the coefficient (matrix), whilew1
is the residual (matrix).bigmatrix
, as well asw1
, if formed explicitly, will each costThis is far too large.
As estimation for each response is independent, there is really no need to form
bigmatrix
in one go and try to store it in RAM. We can just form it tile by tile, and use an iterative procedure: form a tile, use a tile, then discard it. For example, the below considers a tile of dimension100,000 * 2,000
, with memory size:By such iterative procedure, the memory usage is effectively under control.
Note, don't try to compute the residual matrix
w1
, as It will cost 74.5 GB. If you need residual matrix in later work, you should still try to break it into tiles and work one by one.You don't need to worry about the loop here. The computation inside each iteration is costly enough to amortize looping overhead.