Least Frequently Used (LFU) is a type of cache algorithm used to manage memory within a computer. The standard characteristics of this method involve the system keeping track of the number of times a block is referenced in memory. When the cache is full and requires more room the system will purge the item with the lowest reference frequency.
What would be the best way to implement a most-recently-used cache of objects, say in Java?
I've already implemented one using LinkedHashMap(by maintaining the no. of times objects are accessed) But I'm curious if any of the new concurrent collections would be better candidates.
Consider this case : Suppose cache is full and we need to make space for another one. Say two objects are noted in cache which are accessed for one time only. Which one to remove if we come to know that other(which is not in cache)object is being accessed for more than once ?
Thanks!
I think, the LFU data structure must combine priority queue (for maintaining fast access to lfu item) and hash map (for providing fast access to any item by its key); I would suggest the following node definition for each object stored in cache:
You need
key
for referring to an item. You neednumAccesses
as a key for priority queue. You needcurrentPos
to be able to quickly find a pq position of item by key. Now you organize hash map (key(Integer
) -> node(Node<T>
)) to quickly access items and min heap-based priority queue using number of accesses as priority. Now you can very quickly perform all operations (access, add new item, update number of acceses, remove lfu). You need to write each operation carefully, so that it maintains all the nodes consistent (their number of accesses, their position in pq and there existence in hash map). All operations will work with constant average time complexity which is what you expect from cache.Many implementations I have seen have runtime complexity
O(log(n))
. This means, when the cache size isn
, the time needed to insert/remove an element into/from chache is logarithmic. Such implementations use usually amin heap
to maintain usage frequencies of elements. The root of the heap contains the element with lowest frequency, and can be accessed inO(1)
time. But to maintain the heap property we have to move an element, every time it is used (and frequency is incremented) inside of the heap, to place it into proper position, or when we have to insert new element into the cache (and so put it into the heap). But the runtime complexity can be reduced toO(1)
, when we maintain ahashmap
(Java) orunordered_map
(C++) with the element as key. Additinally we need two sorts of lists,frequency list
andelements lists
. Theelements lists
contain elements that have same frequency, and thefrequency list
contain theelement lists
.Here in the example we see the
frequency list
that has 4 elements (4elements lists
). The element list1
contains elements(a,c,m)
, the elements list3
contains elements(k, l, n)
etc. Now, when we use say elementy
, we have to increment its frequency and put it in the next list. Because the elements list with frequency 6 becomes empty, we delete it. The result is:We place the element y in the begin of the
elements list
7. When we have to remove elements from the list later, we will start from the end (firstz
, thenx
and theny
). Now, when we use elementn
, we have to increment its frequency and put it into the new list, with frequencies 4:I hope the idea is clear. I provide now my C++ implementation of the LFU cache, and will add later a Java implementation. The class has just 2 public methods,
void set(key k, value v)
andbool get(key k, value &v)
. In the get method the value to retrieve will be set per reference when the element is found, in this case the method returns true. When the element is not found, the method returns false.Here are examples of usage:
How about a priority queue? You can keep elements sorted there with keys representing the frequency. Just update the object position in the queue after visiting it. You can update just from time to time for optimizing the performance (but reducing precision).
You might benefit from the LFU implementation of ActiveMQ: LFUCache
They have provided some good functionality.
According to me, the best way to implement a most-recently-used cache of objects would be to include a new variable as 'latestTS' for each object. TS stands for timestamp.
// A static method that returns the current date and time as milliseconds since January 1st 1970 long latestTS = System.currentTimeMillis();
ConcurrentLinkedHashMap is not yet implemented in Concurrent Java Collections. (Ref: Java Concurrent Collection API). However, you can try and use ConcurrentHashMap and DoublyLinkedList
About the case to be considered: in such case, as I have said that you can declare latestTS variable, based upon the value of latestTS variable, you can remove an entry and add the new object. (Don't forget to update frequency and latestTS of the new object added)
As you have mentioned, you can use LinkedHashMap as it gives element access in O(1) and also, you get the order traversal. Please, find the below code for LFU Cache: (PS: The below code is the answer for the question in the title i.e. "How to implement LFU cache")