可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I need a disk backed Map structure to use in a Java app. It must have the following criteria:
- Capable of storing millions of records (even billions)
- Fast lookup - the majority of operations on the Map will simply to see if a key already exists. This, and 1 above are the most important criteria. There should be an effective in memory caching mechanism for frequently used keys.
- Persistent, but does not need to be transactional, can live with some failure. i.e. happy to synch with disk periodically, and does not need to be transactional.
- Capable of storing simple primitive types - but I don't need to store serialised objects.
- It does not need to be distributed, i.e. will run all on one machine.
- Simple to set up & free to use.
- No relational queries required
Records keys will be strings or longs. As described above reads will be much more frequent than writes, and the majority of reads will simply be to check if a key exists (i.e. will not need to read the keys associated data). Each record will be updated once only and records are not deleted.
I currently use Bdb JE but am seeking other options.
Update
Have since improved query performance on my existing BDB setup by reducing the dependency on secondary keys. Some queries required a join on two secondary keys and by combining them into a composite key I removed a level of indirection in the lookup which speeds things up nicely.
回答1:
I'd likely use a local database. Like say Bdb JE or HSQLDB. May I ask what is wrong with this approach? You must have some reason to be looking for alternatives.
In response to comments:
As the problem performance and I guess you are already using JDBC to handle this it might be worth trying HSQLB and reading the chapter on Memory and Disk Use.
回答2:
JDBM3 does exactly what you are looking for. It is a library of disk backed maps with really simple API and high performance.
UPDATE
This project has now evolved into MapDB http://www.mapdb.org
回答3:
You may want to look into OrientDB.
回答4:
You can try Java Chronicles from http://openhft.net/products/chronicle-map/
Chronicle Map is a high performance, off-heap, key-value, in memory, persisted data store. It works like a standard java map
回答5:
As of today I would either use MapDB (file based/backed sync or async) or Hazelcast. On the later you will have to implement you own persistency i.e. backed by a RDBMS by implementing a Java interface. OpenHFT chronicle might be an other option. I am not sure how persistency works there since I never used it, but the claim to have one. OpenHFT is completely off heap and allows partial updates of objects (of primitives) without (de-)serialization, which might be a performance benefit.
NOTE: If you need your map disk based because of memory issues the easiest option is MapDB. Hazelcast could be used as a cache (distributed or not) which allows you to evict elements from heap after time or size. OpenHFT is off heap and could be considered if you only need persistency for jvm restarts.
回答6:
I've found Tokyo Cabinet to be a simple persistent Hash/Map, and fast to set-up and use.
This abbreviated example, taken from the docs, shows how simple it is to save and retrieve data from a persistent map:
// create the object
HDB hdb = new HDB();
// open the database
hdb.open("casket.tch", HDB.OWRITER | HDB.OCREAT);
// add item
hdb.put("foo", "hop");
hdb.close();
回答7:
SQLite does this. I wrote a wrapper for using it from Java: http://zentus.com/sqlitejdbc
As I mentioned in a comment, I have successfully used SQLite with gigabytes of data and tables of hundreds of millions of rows. If you think out the indexing properly, it's very fast.
The only pain is the JDBC interface. Compared to a simple HashMap, it is clunky. I often end up writing a JDBC-wrapper for the specific project, which can add up to a lot of boilerplate code.
回答8:
JBoss (tree) Cache is a great option. You can use it standalone from JBoss. Very robust, performant, and flexible.
回答9:
I think Hibernate Shards may easily fulfill all your requirements.