可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I'm implementing an algorithm that involves lots of adding and removing things from sets. In R, this is slow because as far as I know, adding or removing things from a vector is slow, since the entire vector has to be re-allocated. Is there a way do do it more efficiently?

Edit: My current solution is to use a boolean vector of the same length as the list of things that can be in the set, and using that as a membership table.

回答1:

Chapter 2 of The R Inferno has some interesting comments on this, including perdiodic growing objects to reduce memory fragmentation and allocation overhead.

If you know what the ultimate size of the set is, then the method you suggest is probably the best - ie subset from the whole universe using an approprate membership vector. Difficult to know whats best without seeing exactly what you are trying to do though.

回答2:

If you can, initializing a vector so that it has length equal to its maximum length during the algorithm may help.

e.g.

vec <- rep(NA,10)
vec[1:3] <- 1:3
vec[4:5] <- 4:5
vec[6:10] <- 6:10

rather than

vec <- 1:3
vec <- c(vec,4:5)
vec <- c(vec,6:10)

compare

> system.time({vec <- rep(NA,10^4); for (i in 1:(10^4)) vec[i] <- i })
   user  system elapsed 
  0.043   0.001   0.044

> system.time({vec <- NULL; for (i in 1:(10^4)) vec <- c(vec,i) })
   user  system elapsed 
  0.249   0.089   0.335

回答3:

It's hard to say what you want. Maybe you really want stack commands like push and pop. The following isn't that. But it's a fast solution.

Allocate a vector large enough to hold all of your items of the type you need. Set every value to NA. Adding items is simple. Removing items is setting them to NA again. Using the vector is just na.omit(myVec)

myVec <- numeric (maxLength)  # a vector of maximum length

is.na(myVec) <- 1:maxLength   # set every item in myVec to NA

myVec[c(2,6,20)] <- 5         # add some values

na.omit(myVec)

#This will also work if you can initialize all of your values to something that you know you won't need.

回答4:

Yes, there are more efficient ways.

It comes down to how you're using the data; your use case. Are you taking out the data in in the same order you put it in, or in reversed order, or in random order, or in a sorted order?

For FIFO, for a fixed size array use a circular buffer, or for a fully dynamic size, use a deque (pronounced deck). (This is probably what you want.)

For FILO, use a stack.

For grabbing the data at random, consider using a 1 column matrix you never resize. Resizing is slow.

If you need an ordered set (eg c(3,2,5) -> c(2,3,5)), look into a tree or a heap.