I'm planning a side project where I will be dealing with Time Series like data and would like to give one of those shiny new NoSQL DBs a try and am looking for a recommendation.
For a (growing) set of symbols
I will have a list of (time
,value
) tuples (increasing over time).
Not all symbols
will be updated; some symbols
may be updated while others may not, and completely new symbols
may be added.
The database should therefore allow:
- Add Symbols with initial one-element (tuple) list. E.g. A: [(2012-04-14 10:23, 50)]
- Update Symbols with a new tuple. (Append that tuple to the list of that symbol).
- Read the data for a given symbol. (Ideally even let me specify the time frame for which the data should be returned)
The create and update operations should possibly be atomic. If reading multiple symbols at once is possible, that would be interesting.
Performance is not critical. Updates/Creates will happen roughly once every few hours.
Have a look at opentsdb.org an opensource time series database which use hbase. They have been smart on how they store the TS. It is well documented here: http://opentsdb.net/misc/opentsdb-hbasecon.pdf
I believe literally all the major NoSQL databases will support that requirement, especially if you don't actually have a large volume of data (which begs the question, why NoSQL?).
That said, I've had to recently design and work with a NoSQL database for time series data so can give some input on that design, which can then be extrapolated for all others.
Our chosen database was
Cassandra
, and our design was as follows:This lets you achieve everything you asked for, most notably to read the data for a single symbol, and using a range if necessary (column range calls). Although you said performance wasn't critical, it was for us and this was quite performant also - all data for any single symbol is by definition sorted (column name sort) and always stored on the same node (no cross node communication for simple queries). Finally, this design translates well to other NoSQL databases that have have dynamic columns.
Further to this, here's some information on using MongoDB (and capped collections if necessary) for a time series store: MongoDB as a Time Series Database
Finally, here's a discussion of SQL vs NoSQL for time series: https://dba.stackexchange.com/questions/7634/timeseries-sql-or-nosql
I can add to that discussion the following: