How to implement a Least Frequently Used (LFU) cac

2019-03-09 03:50发布

Least Frequently Used (LFU) is a type of cache algorithm used to manage memory within a computer. The standard characteristics of this method involve the system keeping track of the number of times a block is referenced in memory. When the cache is full and requires more room the system will purge the item with the lowest reference frequency.

What would be the best way to implement a most-recently-used cache of objects, say in Java?

I've already implemented one using LinkedHashMap(by maintaining the no. of times objects are accessed) But I'm curious if any of the new concurrent collections would be better candidates.

Consider this case : Suppose cache is full and we need to make space for another one. Say two objects are noted in cache which are accessed for one time only. Which one to remove if we come to know that other(which is not in cache)object is being accessed for more than once ?

Thanks!

5条回答
啃猪蹄的小仙女
2楼-- · 2019-03-09 04:16

I think, the LFU data structure must combine priority queue (for maintaining fast access to lfu item) and hash map (for providing fast access to any item by its key); I would suggest the following node definition for each object stored in cache:

class Node<T> {
   // access key
   private int key;
   // counter of accesses
   private int numAccesses;
   // current position in pq
   private int currentPos;
   // item itself
   private T item;
   //getters, setters, constructors go here
}

You need key for referring to an item. You need numAccesses as a key for priority queue. You need currentPos to be able to quickly find a pq position of item by key. Now you organize hash map (key(Integer) -> node(Node<T>)) to quickly access items and min heap-based priority queue using number of accesses as priority. Now you can very quickly perform all operations (access, add new item, update number of acceses, remove lfu). You need to write each operation carefully, so that it maintains all the nodes consistent (their number of accesses, their position in pq and there existence in hash map). All operations will work with constant average time complexity which is what you expect from cache.

查看更多
老娘就宠你
3楼-- · 2019-03-09 04:20

Many implementations I have seen have runtime complexity O(log(n)). This means, when the cache size is n, the time needed to insert/remove an element into/from chache is logarithmic. Such implementations use usually a min heap to maintain usage frequencies of elements. The root of the heap contains the element with lowest frequency, and can be accessed in O(1) time. But to maintain the heap property we have to move an element, every time it is used (and frequency is incremented) inside of the heap, to place it into proper position, or when we have to insert new element into the cache (and so put it into the heap). But the runtime complexity can be reduced to O(1), when we maintain a hashmap (Java) or unordered_map (C++) with the element as key. Additinally we need two sorts of lists, frequency list and elements lists. The elements lists contain elements that have same frequency, and the frequency list contain the element lists.

  frequency list
  1   3   6   7
  a   k   y   x
  c   l       z
  m   n

Here in the example we see the frequency list that has 4 elements (4 elements lists). The element list 1 contains elements (a,c,m), the elements list 3 contains elements (k, l, n) etc. Now, when we use say element y, we have to increment its frequency and put it in the next list. Because the elements list with frequency 6 becomes empty, we delete it. The result is:

  frequency list
  1   3   7
  a   k   y
  c   l   x
  m   n   z

We place the element y in the begin of the elements list 7. When we have to remove elements from the list later, we will start from the end (first z, then x and then y). Now, when we use element n, we have to increment its frequency and put it into the new list, with frequencies 4:

  frequency list
  1   3   4  7
  a   k   n  y
  c   l      x
  m          z

I hope the idea is clear. I provide now my C++ implementation of the LFU cache, and will add later a Java implementation. The class has just 2 public methods, void set(key k, value v) and bool get(key k, value &v). In the get method the value to retrieve will be set per reference when the element is found, in this case the method returns true. When the element is not found, the method returns false.

#include<unordered_map>
#include<list>

using namespace std;

typedef unsigned uint;

template<typename K, typename V = K>
struct Entry
{
    K key;
    V value;
};


template<typename K, typename V = K>
class LFUCache
{

typedef  typename list<typename Entry<K, V>> ElementList;
typedef typename list <pair <uint, ElementList>> FrequencyList;

private:
    unordered_map <K, pair<typename FrequencyList::iterator, typename ElementList::iterator>> cacheMap;
    FrequencyList elements;
    uint maxSize;
    uint curSize;

    void incrementFrequency(pair<typename FrequencyList::iterator, typename ElementList::iterator> p) {
        if (p.first == prev(elements.end())) {
            //frequency list contains single list with some frequency, create new list with incremented frequency (p.first->first + 1)
            elements.push_back({ p.first->first + 1, { {p.second->key, p.second->value} } });
            // erase and insert the key with new iterator pair
            cacheMap[p.second->key] = { prev(elements.end()), prev(elements.end())->second.begin() };
        }
        else {
            // there exist element(s) with higher frequency
            auto pos = next(p.first);
            if (p.first->first + 1 == pos->first)
                // same frequency in the next list, add the element in the begin
                pos->second.push_front({ p.second->key, p.second->value });
            else
                // insert new list before next list
                pos = elements.insert(pos, { p.first->first + 1 , {{p.second->key, p.second->value}} });
            // update cachMap iterators
            cacheMap[p.second->key] = { pos, pos->second.begin() };
        }
        // if element list with old frequency contained this singe element, erase the list from frequency list
        if (p.first->second.size() == 1)
            elements.erase(p.first);
        else
            // erase only the element with updated frequency from the old list
            p.first->second.erase(p.second);
    }

    void eraseOldElement() {
        if (elements.size() > 0) {
            auto key = prev(elements.begin()->second.end())->key;
            if (elements.begin()->second.size() < 2)
                elements.erase(elements.begin());
            else
                elements.begin()->second.erase(prev(elements.begin()->second.end()));
            cacheMap.erase(key);
            curSize--;
        }
    }

public:
    LFUCache(uint size) {
        if (size > 0)
            maxSize = size;
        else
            maxSize = 10;
        curSize = 0;
    }
    void set(K key, V value) {
        auto entry = cacheMap.find(key);
        if (entry == cacheMap.end()) {
            if (curSize == maxSize)
                eraseOldElement();
            if (elements.begin() == elements.end()) {
                elements.push_front({ 1, { {key, value} } });
            }
            else if (elements.begin()->first == 1) {
                elements.begin()->second.push_front({ key,value });
            }
            else {
                elements.push_front({ 1, { {key, value} } });
            }
            cacheMap.insert({ key, {elements.begin(), elements.begin()->second.begin()} });
            curSize++;
        }
        else {
            entry->second.second->value = value;
            incrementFrequency(entry->second);
        }
    }

    bool get(K key, V &value) {
        auto entry = cacheMap.find(key);
        if (entry == cacheMap.end())
            return false;
        value = entry->second.second->value;
        incrementFrequency(entry->second);
        return true;
    }
};

Here are examples of usage:

    int main()
    {
        LFUCache<int>cache(3); // cache of size 3
        cache.set(1, 1);
        cache.set(2, 2);
        cache.set(3, 3);
        cache.set(2, 4); 

        rc = cache.get(1, r);

        assert(rc);
        assert(r == 1);
        // evict old element, in this case 3
        cache.set(4, 5);
        rc = cache.get(3, r);
        assert(!rc);
        rc = cache.get(4, r);
        assert(rc);
        assert(r == 5);

        LFUCache<int, string>cache2(2);
        cache2.set(1, "one");
        cache2.set(2, "two");
        string val;
        rc = cache2.get(1, val);
       if (rc)
          assert(val == "one");
       else
          assert(false);

       cache2.set(3, "three"); // evict 2
       rc = cache2.get(2, val);
       assert(rc == false);
       rc = cache2.get(3, val);
       assert(rc);
       assert(val == "three");

}
查看更多
仙女界的扛把子
4楼-- · 2019-03-09 04:21

How about a priority queue? You can keep elements sorted there with keys representing the frequency. Just update the object position in the queue after visiting it. You can update just from time to time for optimizing the performance (but reducing precision).

查看更多
再贱就再见
5楼-- · 2019-03-09 04:25

You might benefit from the LFU implementation of ActiveMQ: LFUCache

They have provided some good functionality.

查看更多
Melony?
6楼-- · 2019-03-09 04:26
  1. According to me, the best way to implement a most-recently-used cache of objects would be to include a new variable as 'latestTS' for each object. TS stands for timestamp.

    // A static method that returns the current date and time as milliseconds since January 1st 1970 long latestTS = System.currentTimeMillis();

  2. ConcurrentLinkedHashMap is not yet implemented in Concurrent Java Collections. (Ref: Java Concurrent Collection API). However, you can try and use ConcurrentHashMap and DoublyLinkedList

  3. About the case to be considered: in such case, as I have said that you can declare latestTS variable, based upon the value of latestTS variable, you can remove an entry and add the new object. (Don't forget to update frequency and latestTS of the new object added)

As you have mentioned, you can use LinkedHashMap as it gives element access in O(1) and also, you get the order traversal. Please, find the below code for LFU Cache: (PS: The below code is the answer for the question in the title i.e. "How to implement LFU cache")

import java.util.LinkedHashMap;
import java.util.Map;

public class LFUCache {

    class CacheEntry
    {
        private String data;
        private int frequency;

        // default constructor
        private CacheEntry()
        {}

        public String getData() {
            return data;
        }
        public void setData(String data) {
            this.data = data;
        }

        public int getFrequency() {
            return frequency;
        }
        public void setFrequency(int frequency) {
            this.frequency = frequency;
        }       

    }

    private static int initialCapacity = 10;

    private static LinkedHashMap<Integer, CacheEntry> cacheMap = new LinkedHashMap<Integer, CacheEntry>();
    /* LinkedHashMap is used because it has features of both HashMap and LinkedList. 
     * Thus, we can get an entry in O(1) and also, we can iterate over it easily.
     * */

    public LFUCache(int initialCapacity)
    {
        this.initialCapacity = initialCapacity;
    }

    public void addCacheEntry(int key, String data)
    {
        if(!isFull())
        {
            CacheEntry temp = new CacheEntry();
            temp.setData(data);
            temp.setFrequency(0);

            cacheMap.put(key, temp);
        }
        else
        {
            int entryKeyToBeRemoved = getLFUKey();
            cacheMap.remove(entryKeyToBeRemoved);

            CacheEntry temp = new CacheEntry();
            temp.setData(data);
            temp.setFrequency(0);

            cacheMap.put(key, temp);
        }
    }

    public int getLFUKey()
    {
        int key = 0;
        int minFreq = Integer.MAX_VALUE;

        for(Map.Entry<Integer, CacheEntry> entry : cacheMap.entrySet())
        {
            if(minFreq > entry.getValue().frequency)
            {
                key = entry.getKey();
                minFreq = entry.getValue().frequency;
            }           
        }

        return key;
    }

    public String getCacheEntry(int key)
    {
        if(cacheMap.containsKey(key))  // cache hit
        {
            CacheEntry temp = cacheMap.get(key);
            temp.frequency++;
            cacheMap.put(key, temp);
            return temp.data;
        }
        return null; // cache miss
    }

    public static boolean isFull()
    {
        if(cacheMap.size() == initialCapacity)
            return true;

        return false;
    }
}
查看更多
登录 后发表回答