When I am using ElasticSearch I can save json documents into it and search by them by default. Also I can specify some index settings where I can describe fields type and settings for indexing. My question is about internal implementation of storing data in ElasticSearch. Like in MongoDB I can store dynamic json data here, so all documents will saved as is (actually in BSON but it doesn't metter here). For example:
{
firstName:"A",
lastName: "B"
}
Here we can see that "scheme data" take more disk space than "actual data". So in MongoDB it is good practice to minimize size of "scheme data", like this:
{
f:"A",
l:"B"
}
and provide some mapping in application code to support this scheme. In Elasticsearch (Lucene) I can specify some scheme, so internally it can save only "actual data" not "actual+scheme", but I am not sure about this, because I can store dynamic json data also.
So question is should I implement such optimization in ElasticSearch ?
Yes, it will take slightly more space, but I wouldn't worry about. A document in Elasticsearch is stored as the full JSON, in the
_source
field. It takes up disk space, and uses memory temporarily when returning results.But, you can set the
_source
field to be compressed, and in version of Elasticsearch from 0.90 onwards, the whole segment is compressed, and field names are good candidates for compression.I'd prefer to keep my documents readable rather than cryptic.