What are the disadvantages of ElasticSearch Doc Va

2020-08-26 02:59发布

问题:

The docs claims:

10–25% slower than in-memory fielddata

and

It is possible that doc values will become the default format in the near future

Besides this slight reduction in speed, what are the downsides of using doc values in all of the properties?

Thanks!

回答1:

The trend is to use doc_values whenever possible, as they are getting increasingly more performant than field data (especially since ES 1.4). One of the downsides for now is that you cannot use them with analyzed string fields and boolean fields. Another downside is if you're still using facets, resp. Kibana 3, as both are not leveraging doc values, but you can either migrate to aggregations, resp. upgrade to Kibana 4, so it's not really an issue.

Check out this excellent blog post by Chris Earle which explains the ins and outs of doc values vs fielddata.



回答2:

From Elasticsearch The Definitive Guide [1.x]

Doc values are now only about 10–25% slower than in-memory fielddata, and come with two major advantages:

They live on disk instead of in heap memory. This allows you to work with quantities of fielddata that would normally be too large to fit into memory. In fact, your heap space ($ES_HEAP_SIZE) can now be set to a smaller size, which improves the speed of garbage collection and, consequently, node stability. Doc values are built at index time, not at search time. While in-memory fielddata has to be built on the fly at search time by uninverting the inverted index, doc values are prebuilt and much faster to initialize.

The trade-off is a larger index size and slightly slower fielddata access. Doc values are remarkably efficient, so for many queries you might not even notice the slightly slower speed. Combine that with faster garbage collections and improved initialization times and you may notice a net gain.

The more filesystem cache space that you have available, the better doc values will perform. If the files holding the doc values are resident in the filesystem cache, then accessing the files is almost equivalent to reading from RAM. And the filesystem cache is managed by the kernel instead of the JVM.

Doc values can be enabled for numeric, date, Boolean, binary, and geo-point fields, and for not_analyzed string fields. They do not currently work with analyzed string fields. Doc values are enabled per field in the field mapping, which means that you can combine in-memory fielddata with doc values:

PUT /music/_mapping/song
{
  "properties" : {
    "tag": {
      "type":       "string",
      "index" :     "not_analyzed",
      "doc_values": true 
    }
  }
}

We are using doc_values with booleans, but you cannot use them with analyzed fields. They are talking about it, but don't know what the right data structure should be. See Add Support for doc values for analyzed fields.