I have about 10GB of data stored on a historical node. However the memory consumption for that node is about 2GB.
When I launch a select query, results are returned the first time in more than 30 secondes. Next, they are in second (because of brokers cache).
My concern is to reduce the first time select on whatever query to one second. To achieve such performance, I think it is a good start if historical node store all the data in memory.
Question: what are the configuration parameters in order to force historical node to cache all data in memory ?
Druid doesn't have any direct mechanism to force the data to be cached.To workaround this problem you may try firing some dummy queries at the startup which would load the data segment in memory.
There are various level of caches that come into play when druid queries are launched:
- Cache at historical nodes
- Cache at broker nodes
- Page cache
First two caches are configurable and can be turned on/off as per the requirement whereas the page cache is entirely controlled by the underlying OS.
Since in your setup you have lots of free memory at historical i would suggest you to fire dummy queries at startup which spans across all historical segments which would bring all the segments data in page cache and any queries fired later in time would benefit from this.
Historical and broker caches don't cache the entire data of segment but only the result of a query on each segment hence these won't be useful in case your queries are very dynamic in nature and require different aggregations and filters each time.