MongoDB or Redis ?
I've heard that I should keep collections small in MongoDB to enable better indexing (and indexes fitting on RAM), and I've heard that redis is "blazing fast" but MongoDB is better if you have bigger collections.
What's the most efficient one if I have multiple thousand collections of say a few thousand of hashes ?
I'm asking this because in my project it's too early to have available data to benchmark and I would probably design bad benchmark scripts because I don't understand very well the theoretical concepts of those two database engines, specially Redis.
Thanks for everyone who answers this.
The document size, while important, should not be the most important factor for you in selecting Mongo or Redis. It's rare you'll hit the 4MB limit in Mongo, and if you do, it might be an indicator your document is not broken down enough. Redis is a bit more all purpose so if you intend on using your data store for niche areas in your application state (suggestions boxes, cache, etc.), Redis may be a better fit. If persisting richer items, ones that extend beyond Redis' native data types and structures, Mongo is probably a better fit.
Truthfully, both Redis and Mongo are superb and dead simple to get up and running. Considering it's early in your lifecycle, try both on for size and see what feels better.
It depends very much on the specific use case. If you want to be able to query your documents on something other than their ID then you shouldn't choose Redis. With Redis you would have to implement your own indexing scheme, and that's just unnecessary.
There's actually very few cases where Redis would be a better option for what I think your use case is (not that there's anything wrong with Redis, I often use both Redis and Mongo, but for different things). It sounds to me like you have objects that can be represented as hashes. Both Mongo and Redis can store hashes, but Mongo can do much more. With Mongo you can search for a document on any of its fields, you can add an index to speed it up, and the field doesn't even have to be a string, it can be a number, date, list, even a document (or a list of documents), and all of the documents don't have to fit in RAM (although that will change when Redis' diskstore feature is finished). Redis doesn't have any of that. You would have to implement indexes yourself to be able to search, you can't store anything but strings (which is really inconvenient sometimes), and you can't store anything but flat hashes (without resorting to implement or use some kind of mapping layer like Ohm).
You also mention speed. Redis is blazingly fast, and Mongo isn't bad either, however, for your use case using Mongo may be quicker. Notice I say using Mongo, not that Mongo itself would be quicker. The thing is, if you go with Redis and still want to be able to search for a document using a field that isn't the primary key, you would, as I mentioned above, have to implement this yourself. A search would then have to make at least two requests to Redis, one for looking in the index, and one for getting the document. If a search results in more than one document you would have to make a request for each document individually. The overhead of making all these requests would probably make using Redis worse than using Mongo. In my experience, anything other than the simplest cache, queue, or similar needs to make more than one request to Redis to get everything it needs.
So, with the limited information at my disposal, I recommend MongoDB.
I'll toss one more option into the ring: Berkeley DB XML. It's a small footprint C++ library with C++ and Java APIs that provide XML data management, XQuery and XPath queries. It's designed to be very fast, scalable and reliable. It supports transactions, recovery and replication. You can use it to store XML documents as well as non-SQL key-value pairs.
Disclaimer: I'm the product manager for Berkeley DB, so I'm a bit biased. However, we have lots of customers that use BDB XML for medium to very large document repositories.
Redis came up with secondary indexes which might server your purpose now. Linkg : https://redis.io/topics/indexes
Mongodb should be fine for this:
http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis