C++ design: How to cache most recent used

2020-06-04 01:16发布

We have a C++ application for which we try to improve performance. We identified that data retrieval takes a lot of time, and want to cache data. We can't store all data in memory as it is huge. We want to store up to 1000 items in memory. This items can be indexed by a long key. However, when the cache size goes over 1000, we want to remove the item that was not accessed for the longest time, as we assume some sort of "locality of reference", that is we assume that items in the cache that was recently accessed will probably be accessed again.

Can you suggest a way to implement it?

My initial implementation was to have a map<long, CacheEntry> to store the cache, and add an accessStamp member to CacheEntry which will be set to an increasing counter whenever an entry is created or accessed. When the cache is full and a new entry is needed, the code will scan the entire cache map and find the entry with the lowest accessStamp, and remove it. The problem with this is that once the cache is full, every insertion requires a full scan of the cache.

Another idea was to hold a list of CacheEntries in addition to the cache map, and on each access move the accessed entry to the top of the list, but the problem was how to quickly find that entry in the list.

Can you suggest a better approach?

Thanks
splintor

标签: c++ caching
10条回答
做个烂人
2楼-- · 2020-06-04 01:51

I believe this is a good candidate for treaps. The priority would be the time (virtual or otherwise), in ascending order (older at the root) and the long as the key.

There's also the second chance algorithm, that's good for caches. Although you lose search ability, it won't be a big impact if you only have 1000 items.

The naïve method would be to have a map associated with a priority queue, wrapped in a class. You use the map to search and the queue to remove (first remove from the queue, grabbing the item, and then remove by key from the map).

查看更多
够拽才男人
3楼-- · 2020-06-04 01:52

In my approach, it's needed to have a hash-table for lookup stored objects quickly and a linked-list for maintain the sequence of last used.

When an object are requested. 1) try to find a object from the hash table 2.yes) if found(the value have an pointer of the object in linked-list), move the object in linked-list to the top of the linked-list. 2.no) if not, remove last object from the linked-list and remove the data also from hash-table then put object into hash-table and top of linked-list.

For example Let's say we have a cache memory only for 3 objects.

The request sequence is 1 3 2 1 4.

1) Hash-table : [1] Linked-list : [1]

2) Hash-table : [1, 3] Linked-list : [3, 1]

3) Hash-table : [1,2,3] Linked-list : [2,3,1]

4) Hash-table : [1,2,3] Linked-list : [1,2,3]

5) Hash-table : [1,2,4] Linked-list : [4,1,2] => 3 out

查看更多
forever°为你锁心
4楼-- · 2020-06-04 01:54

I agree with Neil, scanning 1000 elements takes no time at all.

But if you want to do it anyway, you could just use the additional list you propose and, in order to avoid scanning the whole list each time, instead of storing just the CacheEntry in your map, you could store the CacheEntry and a pointer to the element of your list that corresponds to this entry.

查看更多
聊天终结者
5楼-- · 2020-06-04 01:55

An alternative implementation that might make the 'aging' of the elements easier but at the cost of lower search performance would be to keep your CacheEntry elements in a std::list (or use a std::pair<long, CacheEntry>. The newest element gets added at the front of the list so they 'migrate' towards the end of the list as they age. When you check if an element is already present in the cache, you scan the list (which is admittedly an O(n) operation as opposed to being an O(log n) operation in a map). If you find it, you remove it from its current location and re-insert it at the start of the list. If the list length extends over 1000 elements, you remove the required number of elements from the end of the list to trim it back below 1000 elements.

查看更多
登录 后发表回答