If I have an array that I want to be of fixed size N for the purpose of caching the most recent of N items, then once limit N is reached, I'll have to get rid of the oldest item while adding the newest item.
Note: I don't care if the newest item is at the beginning or end of the array, just as long as the items get removed in the order that they are added.
The obvious ways are either:
push()
andshift()
(so thatcache[0]
contains the oldest item), orunshift()
andpop()
(so thatcache[0]
contains the newest item)
Basic idea:
var cache = [], limit = 10000;
function cacheItem( item ) {
// In case we want to do anything with the oldest item
// before it's gone forever.
var oldest = [];
cache.push( item );
// Use WHILE and >= instead of just IF in case the cache
// was altered by more than one item at some point.
while ( cache.length >= limit ) {
oldest.push( cache.shift() );
}
return oldest;
}
However, I've read about memory issues with shift
and unshift
since they alter the beginning of the array and move everything else around, but unfortunately, one of those methods has to be used to do it this way!
Qs:
- Are there other ways to do this that would be better performance-wise?
- If the two ways I already mentioned are the best, are there specific advantages/disadvantages I need to be aware of?
Conclusion
After doing some more research into data structures (I've never programmed in other languages, so if it's not native to Javascript, I likely haven't heard of it!) and doing a bunch of benchmarking in multiple browsers with both small and large arrays as well as small and large numbers of reads / writes, here's what I found:
- The 'circular buffer' method proposed by Bergi is hands-down THE best as far performance (for reasons explained in the answer and comments), and hence it has been accepted as the answer. However, it's not as intuitive, and makes it difficult to write your own 'extra' functions (since you always have to take
offset
into account). If you're going to use this method, I recommend an already-created one like this circular buffer on GitHub. - The 'pop/unpush' method is much more intuitive, and performs fairly well, accept at the most extreme numbers.
- The 'copyWithin' method is, sadly, terrible for performance (tested in multiple browsers), quickly creating unacceptable latency. It also has no IE support. It's such a simple method! I wish it worked better.
- The 'linked list' method, proposed in the comments by Felix Kling, is actually a really good option. I initially disregarded it because it seemed like a lot of extra stuff I didn't need, but to my surprise....
What I actually needed was a Least Recently Used (LRU) Map (which employs a doubly-linked list). Now, since I didn't specify my additional requirements in my original question, I'm still marking Bergi's answer as the best answer to that specific question. However, since I needed to know if a value already existed in my cache, and if so, mark it as the newest item in the cache, the additional logic I had to add to my circular buffer's add()
method (primarily indexOf()
) made it not much more efficient than the 'pop/unpush' method. HOWEVER, the performance of the LRUMap in these situations blew both of the other two out of the water!
So to summarize:
- Linked List -- most options while still maintaining great performance
- Circular Buffer -- best performance for just adding and getting
- Pop / Unpush -- most intuitive and simplest
copyWithin
-- terrible performance currently, no reason to use
You need to
splice
the existing item and put it in the front usingunshift
(as the newest item). If the item doesn't already exist in your cache, then you canunshift
andpop
.item needs to be a
String
orNumber
, or otherwise you'll need to write your own implementation ofindexOf
usingfindIndex
to locate and object (if item is an object).You could use
Array#copyWithin
.You are not looking to copy stuff around within the array, which would take
O(n)
steps every time.Instead, this is the perfect use case for a ring buffer. Just keep an offset to the "start" and "end" of the list, then access your buffer with that offset and modulo its length.