How NoSQL databases perform on aggregate functions

We need to work periodically with a pretty big dataset (30-40GB). It has a lot of values ordered by time (and more information) but we basically need to perform some mathematical operations by month.

Our first approach was to use a MySQL database to back the data, as we have a reasonable experience with the engine and with the relational approach. However, the process takes too long and we were wondering if a NoSQL approach can do it better.

Basically the data that we need to express is:

Value: { NumericalValue, Year, Month }
Entity: List of 'Value'

We process this list three times, doing simple mathematical operations, and when I say 'process' I mean iterate through the dataset and perform the calculus. When everything is over, we have the same structure (but with different data):

Value: { NumericalValue, Year, Month }
Entity: List of 'Value'

It's now when we found the biggest problems as we need to calculate some AVERAGES and it takes a lot. As we repeat this process some times, I think that the most consuming tasks are:

1) Exporting the dataset to MySQL. Which means a lot of inserts from text files.

And when the data is transformed:

2) Compute some queries that contains aggregate functions (AVG,SUM) with LIMIT. 3) Compute some queries that contains aggreate functions with the whole dataset.

Usually, even with some indexes added, we feel that things take too long (20 mins some queries). Any tip or resolution strategy would be very appreciated. I feel that NoSQL databases aren't designed specifically for this, but maybe some experiences could help :).

Thanks for your time,

标签： nosql redis

1条回答

Explosion°爆炸

2楼-- · 2019-07-18 18:42

Your task fits very well into Columnar databases. Column-oriented NoSQL(e.g. Cassandra) databases store data tables as sections of columns of data rather than as rows of data. This improves the speed of aggregations drastically. This have to do with systems that rely on hard disks for storage. If this is not the case(in-memory databases for examples) there are much more options for squeezing out performance.

0人赞添加讨论(0) 举报

How NoSQL databases perform on aggregate functions

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间