NoSQL refers to non-relational data stores that break with the history of relational databases and ACID guarantees. Popular open source NoSQL data stores include:
- Cassandra (tabular, written in Java, used by Cisco, WebEx, Digg, Facebook, IBM, Mahalo, Rackspace, Reddit and Twitter)
- CouchDB (document, written in Erlang, used by BBC and Engine Yard)
- Dynomite (key-value, written in Erlang, used by Powerset)
- HBase (key-value, written in Java, used by Bing)
- Hypertable (tabular, written in C++, used by Baidu)
- Kai (key-value, written in Erlang)
- MemcacheDB (key-value, written in C, used by Reddit)
- MongoDB (document, written in C++, used by Electronic Arts, Github, NY Times and Sourceforge)
- Neo4j (graph, written in Java, used by some Swedish universities)
- Project Voldemort (key-value, written in Java, used by LinkedIn)
- Redis (key-value, written in C, used by Craigslist, Engine Yard and Github)
- Riak (key-value, written in Erlang, used by Comcast and Mochi Media)
- Ringo (key-value, written in Erlang, used by Nokia)
- Scalaris (key-value, written in Erlang, used by OnScale)
- Terrastore (document, written in Java)
- ThruDB (document, written in C++, used by JunkDepot.com)
- Tokyo Cabinet/Tokyo Tyrant (key-value, written in C, used by Mixi.jp (Japanese social networking site))
I'd like to know about specific problems you - the SO reader - have solved using data stores and what NoSQL data store you used.
Questions:
- What scalability problems have you used NoSQL data stores to solve?
- What NoSQL data store did you use?
- What database did you use prior to switching to a NoSQL data store?
I'm looking for first-hand experiences, so please do not answer unless you have that.
I apologize for going against your bold text, since I don't have any first-hand experience, but this set of blog posts is a good example of solving a problem with CouchDB.
CouchDB: A Case Study
Essentially, the textme application used CouchDB to deal with their exploding data problem. They found that SQL was too slow to deal with large amounts of archival data, and moved it over to CouchDB. It's an excellent read, and he discusses the entire process of figuring out what problems CouchDB could solve and how they ended up solving them.
We replaced a postgres database with a CouchDB document database because not having a fixed schema was a strong advantage to us. Each document has a variable number of indexes used to access that document.
I would encourage anyone reading this to try Couchbase once more now that 3.0 is out the door. There are over 200 new features for starters. The performance, availability, scalability and easy management features of Couchbase Server makes for an extremely flexible, highly available database. The management UI is built-in and the APIs automatically discover the cluster nodes so there is no need for a load balancer from the application to the DB. While we don't have a managed service at this time you can run couchbase on things like AWS, RedHat Gears, Cloudera, Rackspace, Docker Containers like CloudSoft, and much more. Regarding rebalancing it depends on what specifically you're referring to but Couchbase doesn't automatically rebalance after a node failure, as designed, but an administrator could setup auto failover for the first node failure and using our APIs you can also gain access to the replica vbuckets for reading prior to making them active or using the RestAPI you can enforce a failover by a monitoring tool. This is a special case but is possible to be done.
We tend not to rebalance in pretty much any mode unless the node is completely offline and never coming back or a new node is ready to be balanced in automatically. Here are a couple of guides to help anyone interested in seeing what one of the most highly performing NoSQL databases is all about.
Lastly, I would also encourage you to check out N1QL for distributed querying:
Thanks for reading and let me or others know if you need more help!
Austin
Todd Hoff's highscalability.com has a lot of great coverage of NoSQL, including some case studies.
The commercial Vertica columnar DBMS might suit your purposes (even though it supports SQL): it's very fast compared with traditional relational DBMSs for analytics queries. See Stonebraker, et al.'s recent CACM paper contrasting Vertica with map-reduce.
Update: And Twitter's selected Cassandra over several others, including HBase, Voldemort, MongoDB, MemcacheDB, Redis, and HyperTable.
Update 2: Rick Cattell has just published a comparison of several NoSQL systems in High Performance Data Stores. And highscalability.com's take on Rick's paper is here.
I used redis to store logging messages across machines. It was very easy to implement, and very useful. Redis really rocks
I have no first-hand experiences., but I found this blog entry quite interesting.