We have a C++ application for which we try to improve performance. We identified that data retrieval takes a lot of time, and want to cache data. We can't store all data in memory as it is huge. We want to store up to 1000 items in memory. This items can be indexed by a long
key. However, when the cache size goes over 1000, we want to remove the item that was not accessed for the longest time, as we assume some sort of "locality of reference", that is we assume that items in the cache that was recently accessed will probably be accessed again.
Can you suggest a way to implement it?
My initial implementation was to have a map<long, CacheEntry>
to store the cache, and add an accessStamp
member to CacheEntry
which will be set to an increasing counter whenever an entry is created or accessed. When the cache is full and a new entry is needed, the code will scan the entire cache map and find the entry with the lowest accessStamp
, and remove it.
The problem with this is that once the cache is full, every insertion requires a full scan of the cache.
Another idea was to hold a list of CacheEntries
in addition to the cache map, and on each access move the accessed entry to the top of the list, but the problem was how to quickly find that entry in the list.
Can you suggest a better approach?
Thanks
splintor
Another option might be to use boost::multi_index. It is designed to separate index from data and by that allowing multiple indexes on the same data.
I am not sure this really would be faster then to scan through 1000 items. It might use more memory then good. Or slow down search and/or insert/remove.
Scanning a map of 1000 elements will take very little time, and the scan will only be performed when the item is not in the cache which, if your locality of reference ideas are correct, should be a small proportion of the time. Of course, if your ideas are wrong, the cache is probably a waste of time anyway.
Create a std:priority_queue<map<int, CacheEntry>::iterator>, with a comparer for the access stamp.. For an insert, first pop the last item off the queue, and erase it from the map. Than insert the new item into the map, and finally push it's iterator onto the queue.
Update: I got it now...
This should be reasonably fast. Warning, some pseudo-code ahead.
The inserts and erases may be a little expensive, but may also not be too bad given the locality (few cache misses). Anyway, if they become a big problem, one could change to std::list.
Have your
map<long,CacheEntry>
but instead of having an access timestamp inCacheEntry
, put in two links to otherCacheEntry
objects to make the entries form a doubly-linked list. Whenever an entry is accessed, move it to the head of the list (this is a constant-time operation). This way you will both find the cache entry easily, since it's accessed from the map, and are able to remove the least-recently used entry, since it's at the end of the list (my preference is to make doubly-linked lists circular, so a pointer to the head suffices to get fast access to the tail as well). Also remember to put in the key that you used in the map into theCacheEntry
so that you can delete the entry from the map when it gets evicted from the cache.As a simpler alternative, you could create a map that grows indefinitely and clears itself out every 10 minutes or so (adjust time for expected traffic).
You could also log some very interesting stats this way.