Multiple indexes or multiple mapping types for spa

2019-04-10 00:32发布

I have ~10 different document types which share 10-15 common fields. But each document type has additional fields, 3 of them up to 30-40 additional fields.

I was considering to use a different mapping type for each document type. But if I correctly understand how mappings work, ElasticSearch will internally use one mapping with 150-200 fields. Because no document has a value for each field, I will end up with a lot of sparse data.

According to this article (Index vs. Type) ElasticSearch is (was?) not very good in dealing with sparse data, so that would be an argument for having a separate index for each document type. But some document types only have very little documents, so it would be overkill to have a separate index for them.

My question: How bad are sparse documents? Or am I better off with a separate index for each type even though some indexes will only contain a few documents?

2条回答
我欲成王,谁敢阻挡
2楼-- · 2019-04-10 01:30

ElasticSearch will internally use one mapping with 150-200 fields. Because no document has a value for each field, I will end up with a lot of sparse data.

Yes, different types within an index share the same mapping structure. Each type just have a “_type” field to every document that is automatically used for filtering when searching on a specific type.

How bad are sparse documents?

Citing from Index Vs Type

Fields that exist in one type will also consume resources for documents of types where this field does not exist. This is a general issue with Lucene indices: they don’t like sparsity.

am I better off with a separate index for each type even though some indexes will only contain a few documents?

As you may be aware that each separate index has its own overhead and types don't gel well with sparse documents.

I would suggest

  • Document Types with small number of documents (with large number of sparse fields) should go to a separate index, obviously by reducing the number of shards to the least possible number i.e. 1. Each index has 5 shards by default. If your number of docs are not that large, it doesn't make sense to use 5 shards and it will reduce the load on search query.
  • Document Types having significant fields in common should go to the same index with different types. Depending on the total number of docs, you may like to increase the number of shards setting.
  • If some document types have a huge number of documents, you may like to create separate indices for them.

Keep in mind that you should keep a reasonable number of shards in your cluster, which can be achieved by reducing the number of shards for indices that don’t require a high write throughput and/or will store low numbers of documents.

查看更多
爱情/是我丢掉的垃圾
3楼-- · 2019-04-10 01:36

There are various implications between choosing Index or a Type. It depends on the computing power of your nodes, how many documents each type will store and so on.

If you say each index will contain only few documents, then I would recommend to go with types, because each index will end up creating separate shards - which would be an overkill for the small set of documents.

You could refer to this SO Answer as well.

查看更多
登录 后发表回答