I have been looking for cloud computing / storage solutions for a long time (inspired by the Google Bigtable). But I can't find a easy-to-use, business-ready solution.
I'm searching a simple, fault tolerant, distributed Key=>Value DB like SimpleDB from Amazon.
I've seen things like:
- The CouchDB Project : Simple and distributed, fault-tolerant Database. But it understands only JSON. No XML connectors etc.
- Eucalyptus : Nice Amazon EC2 interfaces. Open Standards & XML. But less distributed and less fault-tolerant? There are also a lot of open tickets with XEN/VMWare issues.
- Cloudstore / Kosmosfs : Nice distributed, fault tolerant fs. But it's hard to configure. Are there any java connectors?
- Apache Hadoop : Nice system which much more then abilities to store data. Uses its own Hadoop Distributed File System and has been testet on clusters with 2000 nodes.
- *Amazon SimpleDB : Can't find an open-source alternative! It's a nice but expensive system for huge amounts of data. And you're addicted to Amazon.
Are there other, better solutions out there? Which one is the best to choose? Which one offers the smallest amount of SOF(Singe Point of Failure)?
Use The CouchDB
If you want something like Bigtable, you can't go past HBase or Hypertable - they're both open-source Bigtable clones. One thing to consider, though, is if your requirements really are 'big enough' for Bigtable. It scales up to thousands of tablet servers, and as such, has quite a bit of infrastructure under it to enable that (for example, handling the expectation of regular node failures).
If you don't anticipate growing to, at the very least, tens of tablet servers, you might want to consider one of the proposed alternatives: You can't beat BerkelyDb for simplicity, or MySQL for ubiquity. If all you need is a key/value datastore, you can put a simple 'dict' wrapper around your database interface, and switch out your backend if you outgrow one.
Cloudera is a company that commercializes Apache Hadoop, with some value-add of course, like productization, configuration, training & support services.
You might want to look at hypertable which is modeled after google's bigtable.
MongoDB is another option which is very similar to CouchDB, but using query language very similar to SQL instead of map/reduce in JavaScript. It also supports indexes, query profiling, replication and storage of binary data.
It has huge amount of documentation which might be overwhelming at fist, so I would suggest to start with Developer's tour
Tokyo Cabinet has also received some attention as it supports table schemas, key value pairs and hash tables. It uses Lua as an embedded scripting platform and uses HTTP as it's communication protocol Here is an great demonstration.