I am looking for a simple in-memory (and in-process) cache for short-term caching of query data (but short-term meaning beyond request/response, i.e. session boundary). EhCache would probably work, but it looks as if it might not offer one thing that I need: limits not on number of objects cached, but (approximate) limit on amount of memory consumed by cached data.
I understand that it is hard to figure out exact memory usage for given object without serialization (which I want to avoid in general case due to its slowness defeats the purpose for my uses), and I am fine with having to provide size estimate myself.
So: is there a simple open source java cache that allows for defining "weight" of cached objects, to limit amount of things cached?
EDIT (Nov 2010): For what it's worth, there is a new project called Java CacheMate that tries to tackle this issue, along with some other improvement ideas (multi-level in-memory in-process caching)
I agree with Paul that this is often solved by using a soft reference cache, though it may evict entries earlier than you prefer. A usually acceptable solution is to use a normal cache that evicts to the soft cache, and recovers entries on a miss if possible. This victim caching approach works pretty well, giving you a lower bar but extra benefit if free memory is available.
The memory size can be determined by enabling the Java agent, and usage is pretty simple when using the SizeOf utility (http://sourceforge.net/projects/sizeof). I've only used this for debugging purposes, and I'd recommend benchmarking the overhead before adopting it for normal usage.
In my caching library, I am planning on adding the ability to plug in a evaluator once the core algorithm is implemented. This way you could store a collection as the value, but bound the cache by the sum of all collection sizes. I have seen unbounded collections as values in caches cause OutOfMemoryExceptions, so having control is quite handy.
If you really need this, and I'd advise not to, we could enhance my current implementation to support this. You can email me, ben.manes-at-gmail.com.
If you cannot make any estimations - write a cache eviction policy which flushes based on the JVM heap size (polled from System) or triggered by a finalize()-call from an orphaned object (on GC).
The thing that does this job is java.lang.ref.SoftReference . Typically, you extend the SoftReference class so that the subclass contains the key.
EhCache V2.5 currently offers a solution which can cap based on the memory size of the cache. For more details checkout EhCache 2.5 Documentation
How about using a simple LinkedHashMap with LRU algorithm enabled and put all data with a SoftReference in it... such as cache.out(key, new SoftReference(value)) ??
This would limit your cache to the amount of available memory but not kill the rest of your programm, because Java removes the soft references when there is a memory demand... not all.. the oldest first... usually. If you add a reference queue to your implementation, you can also remove the stall entries (only key, no value) from the map.
This would free you from calculating the size of the entries and keeping track of the sum.
As well as guessing the memory usage of the object, for a reasonable algorithm you would also need to guess the cost of recreating it. A reasonable guess would be the cost of recreation is roughly proportional to memory size. So the factors cancel each other out and you need neither. A simple algorithm is probably going to work out better.