I need to build a system to efficiently store & maintain a huge amount (20 [TB]) of data (and be able to access it in 'vector' form). Here are my dimensions:
(1) time (given as an integer of the form YYYYMMDDHHMMSS)
(2) field (a string of any given length, representing a name of a hospital)
(3) instrumentID (an integer representing a uniqueID for the instrument)
I will need a way to be able to store data individually, meaning, something like:
STORE 23789.46 as the data for instrumentID = 5 on field = 'Nhsdg' on time = 20040713113500
Yet, I would need the following query to to run FAST: give me all instruments for field 'X' on timestamp 'Y'
.
In order to build these systems, I am given 60 duo-core machines (each with 1GB of RAM, 1.5TB disk)
Any recommendation on a suitable NoSQL soltuion (that would ideally work with python)?
NOTE: the system will first store historical data (which is roughly 20[TB]). Every day I will add just about 200[MB] at most. I just need a solution that would scale and scale. my use case would be just a simple query: give me all instruments for field 'X' on timestamp 'Y'