[removed] Efficiently move items in and out of a f

2019-04-07 11:56发布

If I have an array that I want to be of fixed size N for the purpose of caching the most recent of N items, then once limit N is reached, I'll have to get rid of the oldest item while adding the newest item.

Note: I don't care if the newest item is at the beginning or end of the array, just as long as the items get removed in the order that they are added.

The obvious ways are either:

  • push() and shift() (so that cache[0] contains the oldest item), or
  • unshift() and pop() (so that cache[0] contains the newest item)

Basic idea:

var cache = [], limit = 10000;

function cacheItem( item ) {

    // In case we want to do anything with the oldest item
    // before it's gone forever.
    var oldest = [];

    cache.push( item );

    // Use WHILE and >= instead of just IF in case the cache
    // was altered by more than one item at some point.
    while ( cache.length >= limit ) {
        oldest.push( cache.shift() );
    }

    return oldest;
}

However, I've read about memory issues with shift and unshift since they alter the beginning of the array and move everything else around, but unfortunately, one of those methods has to be used to do it this way!

Qs:

  1. Are there other ways to do this that would be better performance-wise?
  2. If the two ways I already mentioned are the best, are there specific advantages/disadvantages I need to be aware of?


Conclusion

After doing some more research into data structures (I've never programmed in other languages, so if it's not native to Javascript, I likely haven't heard of it!) and doing a bunch of benchmarking in multiple browsers with both small and large arrays as well as small and large numbers of reads / writes, here's what I found:

  • The 'circular buffer' method proposed by Bergi is hands-down THE best as far performance (for reasons explained in the answer and comments), and hence it has been accepted as the answer. However, it's not as intuitive, and makes it difficult to write your own 'extra' functions (since you always have to take offset into account). If you're going to use this method, I recommend an already-created one like this circular buffer on GitHub.
  • The 'pop/unpush' method is much more intuitive, and performs fairly well, accept at the most extreme numbers.
  • The 'copyWithin' method is, sadly, terrible for performance (tested in multiple browsers), quickly creating unacceptable latency. It also has no IE support. It's such a simple method! I wish it worked better.
  • The 'linked list' method, proposed in the comments by Felix Kling, is actually a really good option. I initially disregarded it because it seemed like a lot of extra stuff I didn't need, but to my surprise....

What I actually needed was a Least Recently Used (LRU) Map (which employs a doubly-linked list). Now, since I didn't specify my additional requirements in my original question, I'm still marking Bergi's answer as the best answer to that specific question. However, since I needed to know if a value already existed in my cache, and if so, mark it as the newest item in the cache, the additional logic I had to add to my circular buffer's add() method (primarily indexOf()) made it not much more efficient than the 'pop/unpush' method. HOWEVER, the performance of the LRUMap in these situations blew both of the other two out of the water!

So to summarize:

  1. Linked List -- most options while still maintaining great performance
  2. Circular Buffer -- best performance for just adding and getting
  3. Pop / Unpush -- most intuitive and simplest
  4. copyWithin -- terrible performance currently, no reason to use

3条回答
乱世女痞
2楼-- · 2019-04-07 12:26

You need to splice the existing item and put it in the front using unshift (as the newest item). If the item doesn't already exist in your cache, then you can unshift and pop.

function cacheItem( item )  
{
    var index = cache.indexOf( item ); 
    index != -1 ?  cache.splice( index, 1 ) : cache.pop();
    cache.unshift( item );
}

item needs to be a String or Number, or otherwise you'll need to write your own implementation of indexOf using findIndex to locate and object (if item is an object).

查看更多
一纸荒年 Trace。
3楼-- · 2019-04-07 12:33

You could use Array#copyWithin.

The copyWithin() method shallow copies part of an array to another location in the same array and returns it, without modifying its size.

Description

The copyWithin works like C and C++'s memmove, and is a high-performance method to shift the data of an Array. This especially applies to the TypedArray method of the same name. The sequence is copied and pasted as one operation; pasted sequence will have the copied values even when the copy and paste region overlap.

The copyWithin function is intentionally generic, it does not require that its this value be an Array object.

The copyWithin method is a mutable method. It does not alter the length of this, but will change its content and create new properties if necessary.

var array = [0, 1, 2, 3, 4, 5];

array.copyWithin(0, 1);

console.log(array);

查看更多
趁早两清
4楼-- · 2019-04-07 12:37

If I have an array that caches the most recent of N items, once limit N is reached, I'll have to get rid of the oldest while adding the newest.

You are not looking to copy stuff around within the array, which would take O(n) steps every time.

Instead, this is the perfect use case for a ring buffer. Just keep an offset to the "start" and "end" of the list, then access your buffer with that offset and modulo its length.

var cache = new Array(10000);
cache.offset = 0;

function cacheItem(item) {
    cache[cache.offset++] = item;
    cache.offset %= cache.length;
}
function cacheGet(i) { // backwards, 0 is most recent
    return cache[(cache.offset - 1 - i + cache.length) % cache.length];
}
查看更多
登录 后发表回答