I need a very basic key-value store for java. I started with a HashMap but it seems that HashMap is somewhat space inefficient (I'm storing ~20 million records, and seems to require ~6GB RAM).
The map is Map<Integer,String>
, and so I'm considering using GNU Trove TIntObjectHashMap<byte[]>
, and storing the map value as an ascii byte array rather than String.
As an alternative to that, is there a key-value store that only requires adding jar files, does not hold the entire map in RAM at once, and is still reasonably fast?
BabuDB
License: New BSD license, Language: Java
JDBM2
License: Apache License 2.0, Language: Java
Banana DB
License: Apache License 2.0, Language: Java
I've tried BabuDB and JDBM2 and they work fine. BabuDB is a little bit more difficult to set up, but potentially delivers higher performance than JDBM2.
These all all databases, which allow to persist data on disk. There are also solutions to hold a large map in memory (ehcache, hazelcast, ...).
Just wanted to reference some more open source options that became available over time since this question was first asked.
Apache 2, BTree, Apache Directory Project JDBM replacement effort:
http://directory.apache.org/mavibot/
MPL2/EPL1, RTree, MVStore, H2 Storage Engine:
http://www.h2database.com/html/mvstore.html
Apache 2, Xodus Environments, JetBrains YouTrack and Hub storage engine:
https://github.com/JetBrains/xodus
Use Berkeley DB.
This should definitely give you huge gains in memory and speed, while not increasing the complexity of your application. Enjoy!
Consider Koloboke Collections, which is up to 2 times faster than Trove according to various tests:
if configured to consume the same memory as Trove. Or alternatively, you can think it consumes considerably lesser memory if configured to be equally fast to Trove.
If you want to persist the map between JVM runs with very quick bootstrap, you might also be interested in Chronicle-Map which stores
String
s in UTF-8 by default (so you shouldn't bother with conversionsString
<->byte[]
as with Koloboke/Trove). Chronicle-Map is ultra fast for persisted key-value store, but a bit slower that Koloboke and even Trove.This doesn't entirely make sense because a
TIntObjectHashMap
is not aMap
. However, the approach is sound.The best answer is to try it out.
But here are some rough estimates (assuming a 32bit JVM):
HashMap keys would need to be Integer instances. They will occupy ~18bytes per instance + 4 bytes per reference. Total 24 bytes.
Trove keys would be 4 byte
int
values.String values would be 20 bytes + 12 bytes + 2 * number of "characters".
Byte array values would be 12 bytes + 1 * number of "characters".
I haven't examined the details of the respective hash table internal data structures.
That probably amounts to around 50% memory saving, though it depends critically on the average length of the value "strings". (The longer they are, the more they will dominate the space usage.)
FWIW, Trove publish their own benchmarks here. They don't look very convincing, but you should be able to dig out their benchmark code and modify it to better match your use-case.
http://www.mapdb.org/ is what you are looking for. It's a rocking fast persistent implementation of java.util.Map.
Features
Concurrent
MapDB has record level locking and state-of-art concurrent engine. Its performance scales nearly linearly with number of cores. Data can be written by multiple parallel threads.
Fast
MapDB has outstanding performance rivaled only by native DBs. It is result of more than a decade of optimizations and rewrites.
ACID
MapDB optionally supports ACID transactions with full MVCC isolation. MapDB uses write-ahead-log or append-only store for great write durability.
Flexible
MapDB can be used everywhere from in-memory cache to multi-terabyte database. It also has number of options to trade durability for write performance. This makes it very easy to configure MapDB to exactly fit your needs.
Hackable
MapDB is component based, most features (instance cache, async writes, compression) are just class wrappers. It is very easy to introduce new functionality or component into MapDB.
SQL Like
MapDB was developed as faster alternative to SQL engine. It has number of features which makes transition from relational database easier: secondary indexes/collections, autoincremental sequential ID, joins, triggers, composite keys…
Low disk-space usage
MapDB has number of features (serialization, delta key packing…) to minimize disk used by its store. It also has very fast compression and custom serializers. We take disk-usage seriously and do not waste single byte.