vectorizing an R-loop with backward dependency

2019-07-12 16:02发布

I have a random vector vec and want make a new vector L without using a loop. New element of L depends on old elements of L and vec.

set.seed(0)

vec <- rnorm(20,0)
i = 2; 
N <- length(vec) -1
L <- numeric(N-1)
constant <- 0.6

while (i < N){  

  L[i] = vec[i + 1] - vec[i] - constant * L[i - 1]

  i <- i + 1          
}

L
#  [1]  0.0000000  1.6560326 -1.0509895 -0.2271942 -1.8182750  1.7023480 -0.3875622  0.5214906  2.0975262 -2.8995756  0.1771427
# [12] -0.4549334  1.1311555 -0.6884468  0.3007724  0.4832709 -1.4341071  2.1880687

4条回答
Viruses.
2楼-- · 2019-07-12 16:26

You want

L[1] = 0
L[i] = -constant * L[i - 1] + (vec[i + 1] - vec[i]),  i = 2, 3, ..., 

Let dv <- diff(vec), the 2nd line becomes

L[i] = -constant * L[i - 1] + dv[i],  i = 2, 3, ...

an AR1 process with lag-1 auto-correlation -constant and innovation dv[-1]. AR1 process can be efficiently generated by filter with "recursive" method.

dv <- diff(vec)
L <- c(0, filter(dv[-1], -constant, "recursive"))

# [1]  0.0000000  1.6560326 -1.0509895 -0.2271942 -1.8182750  1.7023480
# [7] -0.3875622  0.5214906  2.0975262 -2.8995756  0.1771427 -0.4549334
#[13]  1.1311555 -0.6884468  0.3007724  0.4832709 -1.4341071  2.1880687
#[19] -2.9860629

I guess you mean while (i <= N) in your question. If you do want i < N, then you have to get rid of the last element above. Which can be done by

dv <- diff(vec)
L <- c(0, filter(dv[2:(length(dv) - 1)], -constant, "recursive"))

hours later...

I was brought to attention by Rui Barradas's benchmark. For short vec, any method is fast enough. For long vec, filter is definitely faster, but practically suffers from coercion as filter expects and returns a "ts" (time series) object. It is better to call its workhorse C routine straightaway:

AR1_FILTER <- function (x, filter, full = TRUE) {
  n <- length(x)
  AR1 <- .Call(stats:::C_rfilter, as.double(x), as.double(filter), double(n + 1L))
  if (!full) AR1 <- AR1[-1L]
  AR1
  }

dv <- diff(vec)
L <- AR1_FILTER(dv[-1], -constant)
#L <- AR1_FILTER(dv[2:(length(dv) - 1)], -constant)

I am not interested in comparing AR1_FILTER with R-level loop. I will just compare it with filter.

library(microbenchmark)
v <- runif(100000)
microbenchmark("R" = c(0, filter(v, -0.6, "recursive")),
               "C" = AR1_FILTER(v, -0.6))

Unit: milliseconds
 expr      min       lq     mean   median       uq       max neval
    R 6.803945 7.987209 11.08361 8.074241 9.131967 54.672610   100
    C 2.586143 2.606998  2.76218 2.644068 2.660831  3.845041   100
查看更多
别忘想泡老子
3楼-- · 2019-07-12 16:35
library(dplyr)


L2 <- c(0,lead(vec) - vec - constant * lag(L))
L2 <- L2[!is.na(L2)]
L2

 [1]  0.00000000  1.09605531 -0.62765133  1.81529867 -2.10535596  3.10864280 -4.36975556  1.41375965
 [9] -1.08809820  2.16767510 -1.82140234  1.14748512 -0.89245650  0.03962074 -0.10930073  1.48162072
[17] -1.63074832  2.21593009


all.equal(L,L2)
[1] TRUE
查看更多
Summer. ? 凉城
4楼-- · 2019-07-12 16:36

When you have to compute values based on previous values the general purpose answer is no, there is no way around a loop.

In your case I would use a for loop. It's simpler.

M <- numeric(N - 1)
for(i in seq_len(N)[-N])
  M[i] = vec[i + 1] - vec[i] - constant*M[i - 1]

identical(L, M)
#[1] TRUE

Note the use of seq_len, not 2:(N - 1).

Edit.

I have timed the solutions by myself and by user 李哲源. The results are clearly favorable to my solution.

f1 <- function(vec, constant = 0.6){
  N <- length(vec) - 1
  M <- numeric(N - 1)
  for(i in seq_len(N)[-c(1, N)]){
    M[i] = vec[i + 1] - vec[i] - constant*M[i - 1]
  }
  M
}

f2 <- function(vec, constant = 0.6){
  dv <- diff(vec)
  c(0, c(stats::filter(dv[2:(length(dv) - 1)], -constant, "recursive")) )
}

L1 <- f1(vec)
L2 <- f2(vec)

identical(L, L1)
identical(L, L2)

microbenchmark::microbenchmark(
  loop = f1(vec),
  filter = f2(vec)
)

On my PC the ratio of the medians gives my code 11 times faster.

查看更多
不美不萌又怎样
5楼-- · 2019-07-12 16:37

I was thinking about using Rcpp for this, but one of the answer mentioned rfilter built internally in R, so I had a check:

/* recursive filtering */
SEXP rfilter(SEXP x, SEXP filter, SEXP out)
{
   if (TYPEOF(x) != REALSXP || TYPEOF(filter) != REALSXP
       || TYPEOF(out) != REALSXP) error("invalid input");
    R_xlen_t nx = XLENGTH(x), nf = XLENGTH(filter);
    double sum, tmp, *r = REAL(out), *rx = REAL(x), *rf = REAL(filter);

    for(R_xlen_t i = 0; i < nx; i++) {
    sum = rx[i];
    for (R_xlen_t j = 0; j < nf; j++) {
        tmp = r[nf + i - j - 1];
        if(my_isok(tmp)) sum += tmp * rf[j];
        else { r[nf + i] = NA_REAL; goto bad3; }
    }
    r[nf + i] = sum;
    bad3:
    continue;
    }
    return out;
}

This function is already pretty look and I don't think I could write an Rcpp one to beat it with great improvement. I did a benchmark with this rfilter and the f1 function in the accepted answer:

f1 <- function(vec, constant = 0.6){
  N <- length(vec) - 1
  M <- numeric(N - 1)
  for(i in seq_len(N)[-c(1, N)]){
    M[i] = vec[i + 1] - vec[i] - constant*M[i - 1]
  }
  M
}

AR1_FILTER <- function (x, filter, full = TRUE) {
  n <- length(x)
  AR1 <- .Call(stats:::C_rfilter, as.double(x), as.double(filter), double(n + 1L))
  if (!full) AR1 <- AR1[-1L]
  AR1
  }

f2 <- function (vec, constant) {
  dv <- diff(vec)
  AR1_FILTER(dv[2:(length(dv) - 1)], -constant)
  }

library(microbenchmark)

Bench <- function (n) {
  vec <- runif(n)
  microbenchmark("R" = f1(vec, 0.6), "C" = f2(vec, 0.6))
  }

For short vectors with length 100, I got

Bench(100)

Unit: microseconds
 expr    min      lq     mean median      uq     max neval
    R 68.098 69.8585 79.05593 72.456 74.6210 244.148   100
    C 66.423 68.5925 73.18702 69.793 71.1745 150.029   100

For large vectors with length 10000, I got

Bench(10000)

Unit: microseconds
 expr      min        lq     mean    median       uq      max neval
    R 6168.742 6699.9170 6870.277 6786.0415 6997.992 8921.279   100
    C  876.934  904.6175 1192.000  931.9345 1034.273 2962.006   100

Yeah, there is no way that R is going to beat a compiled language.

查看更多
登录 后发表回答