Why do I get these warnings after adding more data to my elasticsearch?
And the warnings are different every time I browse the dashboard.
"Courier Fetch: 30 of 60 shards failed."
More details:
It's a sole node on a CentOS 7.1
/etc/elasticsearch/elasticsearch.yml
index.number_of_shards: 3
index.number_of_replicas: 1
bootstrap.mlockall: true
threadpool.bulk.queue_size: 1000
indices.fielddata.cache.size: 50%
threadpool.index.queue_size: 400
index.refresh_interval: 30s
index.number_of_shards: 5
index.number_of_replicas: 1
/usr/share/elasticsearch/bin/elasticsearch.in.sh
ES_HEAP_SIZE=3G
#I use this Garbage Collector instead of the default one.
JAVA_OPTS="$JAVA_OPTS -XX:+UseG1GC"
cluster status
{
"cluster_name" : "my_cluster",
"status" : "yellow",
"timed_out" : false,
"number_of_nodes" : 1,
"number_of_data_nodes" : 1,
"active_primary_shards" : 61,
"active_shards" : 61,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 61
}
cluster details
{
"cluster_name" : "my_cluster",
"nodes" : {
"some weird number" : {
"name" : "ES 1",
"transport_address" : "inet[localhost/127.0.0.1:9300]",
"host" : "some host",
"ip" : "150.244.58.112",
"version" : "1.4.4",
"build" : "c88f77f",
"http_address" : "inet[localhost/127.0.0.1:9200]",
"process" : {
"refresh_interval_in_millis" : 1000,
"id" : 7854,
"max_file_descriptors" : 65535,
"mlockall" : false
}
}
}
}
I'm curious about the "mlockall" : false because on the yml I did write bootstrap.mlockall: true
logs
lots of lines like:
org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution (queue capacity 1000) on org.elasticsearch.search.action.SearchServiceTransportAction$23@a9a34f5
This is likely an indication that there's a problem with your cluster's health. Without knowing more about your cluster, there's not much more that can be said.
For me tuning the threadpool search queue_size solved the issue. I tried a number of other things and this is the one that solved it.
I added this to my elasticsearch.yml
threadpool.search.queue_size: 10000
and then restarted elasticsearch.
Reasoning... (from the docs)
A node holds several thread pools in order to improve how threads
memory consumption are managed within a node. Many of these pools also
have queues associated with them, which allow pending requests to be
held instead of discarded.
and for search in particular...
For count/search operations. Defaults to fixed with a size of int((#
of available_processors * 3) / 2) + 1, queue_size of 1000.
For more information you can refer to the elasticsearch docs here...
I had trouble finding this information so I hope this helps others!
Using Elasticsearch 5.4 thread_pool has an underscore it it.
thread_pool.search.queue_size: 10000
See documentation at Elasticsearch Thread Pool module documentation
I agree with @Philip's opinion, But it's necessary to restart elasticsearch at least on Elasticsearch >=1.5.2, because you can dynamically set threadpool.search.queue_size
.
curl -XPUT http://your_es:9200/_cluster/settings
{
"transient":{
"threadpool.search.queue_size":10000
}
}
I got this error when my query was missing a closing quote:
field:"value
In my ElasticSearch logs I see these exceptions:
Caused by: org.elasticsearch.index.query.QueryShardException:
Failed to parse query [field:"value]
...
Caused by: org.apache.lucene.queryparser.classic.ParseException:
Cannot parse 'field:"value': Lexical error at line 1, column 13.
Encountered: <EOF> after : "\"value"
from Elasticsearch >= version 5, its not possible to update cluster settings for thread_pool.search.queue_size using _cluster/settings API. In my case updating ElasticSearch Node yml file is not an option either since if node fails then auto scaling code would bring other ES node with default yml settings.
I have a cluster with 3 nodes and having 400 active primary shards with 7 active threads for queue size of 1000. Increasing number of nodes to 5 with similar config has resolved the issue as queries are getting distributed horizontally to more available nodes.