I need a way to do key-value lookups across (potentially) hundreds of GB of data. Ideally something based on a distributed hashtable, that works nicely with Java. It should be fault-tolerant, and open source.
The store should be persistent, but would ideally cache data in memory to speed things up.
It should be able to support concurrent reads and writes from multiple machines (reads will be 100X more common though). Basically the purpose is to do a quick initial lookup of user metadata for a web-service.
Can anyone recommend anything?
You should probably specify if it needs to be persistent or not, in memory or not, etc. You could try: http://www.danga.com/memcached/
Try distributed Map structure from Redisson, it based on Redis server. Using Redis cluster configuration you may split data across 1000 servers.
Usage example:
Open Chord is an implementation of the CHORD protocol in Java. It is a distributed hash table protocol that should fit your needs perfectly.
DNS has the capability to do this, I don't know how large each one of your records is (8GB of tons of small data?), but it may work.
Depending on the use case, Terracotta may be just what you need.
OpenChord sounds promising; but i'd also consider BDB, or any other non-SQL hashtable, making it distributed can be dead-easy (if the number of storage nodes is (almost) constant, at least), just hash the key on the client to get the appropriate server.