Please don't say EHCache or OSCache, etc. Assume for purposes of this question that I want to implement my own using just the SDK (learning by doing). Given that the cache will be used in a multithreaded environment, which datastructures would you use? I've already implemented one using LinkedHashMap and Collections#synchronizedMap, but I'm curious if any of the new concurrent collections would be better candidates.
UPDATE: I was just reading through Yegge's latest when I found this nugget:
If you need constant-time access and want to maintain the insertion order, you can't do better than a LinkedHashMap, a truly wonderful data structure. The only way it could possibly be more wonderful is if there were a concurrent version. But alas.
I was thinking almost exactly the same thing before I went with the LinkedHashMap
+ Collections#synchronizedMap
implementation I mentioned above. Nice to know I hadn't just overlooked something.
Based on the answers so far, it sounds like my best bet for a highly concurrent LRU would be to extend ConcurrentHashMap using some of the same logic that LinkedHashMap
uses.
Or use this Apache Commons data structure:
https://commons.apache.org/proper/commons-collections/apidocs/org/apache/commons/collections4/map/LRUMap.html
I would consider using java.util.concurrent.PriorityBlockingQueue, with priority determined by a "numberOfUses" counter in each element. I would be very, very careful to get all my synchronisation correct, as the "numberOfUses" counter implies that the element can't be immutable.
The element object would be a wrapper for the objects in the cache:
Here is my implementation for LRU. I have used PriorityQueue, which basically works as FIFO and not threadsafe. Used Comparator based on the page time creation and based on the performs the ordering of the pages for the least recently used time.
Pages for consideration : 2, 1, 0, 2, 8, 2, 4
Page added into cache is : 2
Page added into cache is : 1
Page added into cache is : 0
Page: 2 already exisit in cache. Last accessed time updated
Page Fault, PAGE: 1, Replaced with PAGE: 8
Page added into cache is : 8
Page: 2 already exisit in cache. Last accessed time updated
Page Fault, PAGE: 0, Replaced with PAGE: 4
Page added into cache is : 4
OUTPUT
LRUCache Pages
-------------
PageName: 8, PageCreationTime: 1365957019974
PageName: 2, PageCreationTime: 1365957020074
PageName: 4, PageCreationTime: 1365957020174
enter code here
This is the LRU cache I use, which encapsulates a LinkedHashMap and handles concurrency with a simple synchronize lock guarding the juicy spots. It "touches" elements as they are used so that they become the "freshest" element again, so that it is actually LRU. I also had the requirement of my elements having a minimum lifespan, which you can also think of as "maximum idle time" permitted, then you're up for eviction.
However, I agree with Hank's conclusion and accepted answer -- if I were starting this again today, I'd check out Guava's
CacheBuilder
.There are two open source implementations.
Apache Solr has ConcurrentLRUCache: https://lucene.apache.org/solr/3_6_1/org/apache/solr/util/ConcurrentLRUCache.html
There's an open source project for a ConcurrentLinkedHashMap: http://code.google.com/p/concurrentlinkedhashmap/
Well for a cache you will generally be looking up some piece of data via a proxy object, (a URL, String....) so interface-wise you are going to want a map. but to kick things out you want a queue like structure. Internally I would maintain two data structures, a Priority-Queue and a HashMap. heres an implementation that should be able to do everything in O(1) time.
Here's a class I whipped up pretty quick:
Here's how it works. Keys are stored in a linked list with the oldest keys in the front of the list (new keys go to the back) so when you need to 'eject' something you just pop it off the front of the queue and then use the key to remove the value from the map. When an item gets referenced you grab the ValueHolder from the map and then use the queuelocation variable to remove the key from its current location in the queue and then put it at the back of the queue (its now the most recently used). Adding things is pretty much the same.
I'm sure theres a ton of errors here and I haven't implemented any synchronization. but this class will provide O(1) adding to the cache, O(1) removal of old items, and O(1) retrieval of cache items. Even a trivial synchronization (just synchronize every public method) would still have little lock contention due to the run time. If anyone has any clever synchronization tricks I would be very interested. Also, I'm sure there are some additional optimizations that you could implement using the maxsize variable with respect to the map.