I have been looking for cloud computing / storage solutions for a long time (inspired by the Google Bigtable). But I can't find a easy-to-use, business-ready solution.
I'm searching a simple, fault tolerant, distributed Key=>Value DB like SimpleDB from Amazon.
I've seen things like:
- The CouchDB Project : Simple and distributed, fault-tolerant Database. But it understands only JSON. No XML connectors etc.
- Eucalyptus : Nice Amazon EC2 interfaces. Open Standards & XML. But less distributed and less fault-tolerant? There are also a lot of open tickets with XEN/VMWare issues.
- Cloudstore / Kosmosfs : Nice distributed, fault tolerant fs. But it's hard to configure. Are there any java connectors?
- Apache Hadoop : Nice system which much more then abilities to store data. Uses its own Hadoop Distributed File System and has been testet on clusters with 2000 nodes.
- *Amazon SimpleDB : Can't find an open-source alternative! It's a nice but expensive system for huge amounts of data. And you're addicted to Amazon.
Are there other, better solutions out there? Which one is the best to choose? Which one offers the smallest amount of SOF(Singe Point of Failure)?
How about memcached?
The High Scalability blog covers this issue; if there's an open source solution for what you're after, it'll surely be there.
Other projects include:
Another good list: Anti-RDBMS: A list of distributed key-value stores
You might want to take a look at this (using MySQL as key-value store):
http://bret.appspot.com/entry/how-friendfeed-uses-mysql
I use Google's Google Base api, it's Xml, free, documented, cloud based, and has connectors for many languages. I think it will fill your bill if you want free hosting too.
Now if you want to host your own servers Tokyo cabinet is your answer, its key=>value based, uses flat files, and is the fastest database out there right now (very barebones compared to say Oracle, but incredibly good at storing and accessing data, about 1 million records per second, with about 10bytes of overhead (depending on the storage engine)). As for business ready TokyoCabinet is the heart of a service called Mixi, which is the equivalent of Japan's Facebook+MyPage, with several million heavy users, so it's actually very battle proven.
Good compilation of storage tools for your question :
http://www.metabrew.com/article/anti-rdbms-a-list-of-distributed-key-value-stores/
Instead of looking for something inspired by Google's bigtable- Why not just use bigtable directly? You could write a front-end on Google App-Engine.
Wikipedia says that Yahoo both contributes to Hadoop and uses it in production (article linked from wikipedia). So I'd say it counts for business-provenness, although I'm not sure whether it counts as a K/V value database.
Not on your list is the Friendfeed system of using MySQL as a simple schema-less key/value store.
It's hard for me to understand your priorities. CouchDB is simple, fault-tolerant, and distributed, but somehow you exclude it because it doesn't have XML. Are XML and Java connectors an unstated requirement?
(Anyway, CouchDB should in fact be excluded because it's young, its API isn't stable, and it's not a key-value store.)