KDB query performance improvement

2019-08-13 15:01发布

问题:

I have a simple table containing prices that I'm using for stock algo back testing.

price_hist:([pxkey:`$()]price:`float$())
update `g#pxkey from `price_hist

pxkey is a concatenated string in the format 'MSFT_5M_201710060945', so stock=MSFT, price bar intervals=5 mins and datetime=201710060945. I used the concatenated string instead of individual columns because it's simple and I'm a KDB novice and I wanted to get something running quickly.

I have about 5 million rows in there and the performance is only marginally faster than MySql using the exact same data. Any ideas on how to improve this (either thru table structure, attributes, query, anything..)? FYI I'm using C# with qSharp library and to query i'm using this format which returns a dictionary:-

price_hist`MSFT_5M_201710060945

回答1:

Creating millions of generated symbols is never a good idea in kdb+. I would recommend using a keyed table instead of a dictionary:

bar5m:([sym:`$();time:`timestamp$()]price:`float$())

Once you populate it, you should be able to query it as follows

bar5m[(`MSFT;2017.10.06D09:45);`price]

To improve the performance, make sure the table is sorted by sym,time and put the p attribute on sym.