I need to store huge amount of binary files (10 - 20 TB, each file ranging from 512 kb to 100 MB).
I need to know if Redis will be efficient for my system.
I need following properties in my system:
- High Availability
- Failover
- Sharding
I intend to use a cluster of commodity hardware to reduce costing as much as possible. Please suggest pros and cons of building such a system using Redis. I am also concerned about high ram requirements of Redis.
I would not use Redis for such a task. Other products will be a better fit IMO.
Redis is an in-memory data store. If you want to store 10-20 TB of data, you will need 10-20 TB of RAM, which is expensive. Furthermore, the memory allocator is optimized for small objects, not big ones. You would probably have to cut your files in various small pieces, it would not be really convenient.
Redis does not provide an ad-hoc solution for HA and failover. A master/slave replication is provided (and works quite well), but with no support for the automation of this failover. Clients have to be smart enough to switch to the correct server. Something on server-side (but this is unspecified) has to switch the roles between master and slaves nodes in a reliable way. In other words, Redis only provides a do-it-yourself HA/failover solution.
Sharding has to be implemented on client-side (like with memcached). Some clients have support for it, but not all of them. The fastest client (hiredis) does not. Anyway, things like rebalancing has to be implemented on top of Redis. Redis Cluster which is supposed to support such sharding capabilities is not ready yet.
I would suggest to use some other solutions. MongoDB with GridFS can be a possibility. Hadoop with HDFS is another one. If you like cutting edge projects, you may want to give the Elliptics Network a try.