In my system, the insertion of data is always done through csv files via logstash. I never pre-define the mapping. But whenever I input a string it is always taken to be analyzed
, as a result an entry like hello I am Sinha
is split into hello
,I
,am
,Sinha
. Is there anyway I could change the default/dynamic mapping of elasticsearch so that all strings, irrespective of index, irrespective of type are taken to be not analyzed
? Or is there a way of setting it in the .conf
file? Say my conf
file looks like
input {
file {
path => "/home/sagnik/work/logstash-1.4.2/bin/promosms_dec15.csv"
type => "promosms_dec15"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
csv {
columns => ["Comm_Plan","Queue_Booking","Order_Reference","Multi_Ordertype"]
separator => ","
}
ruby {
code => "event['Generation_Date'] = Date.parse(event['Generation_Date']);"
}
}
output {
elasticsearch {
action => "index"
host => "localhost"
index => "promosms-%{+dd.MM.YYYY}"
workers => 1
}
}
I want all the strings to be not analyzed
and I don't mind it being the default setting for all future data to be inserted into elasticsearch either
You can query the .raw
version of your field. This was added in Logstash 1.3.1:
The logstash index template we provide adds a “.raw” field to every field you index. These “.raw” fields are set by logstash as “not_analyzed” so that no analysis or tokenization takes place – our original value is used as-is!
So if your field is called foo
, you'd query foo.raw
to return the not_analyzed
(not split on delimiters) version.
Just create a template. run
curl -XPUT localhost:9200/_template/template_1 -d '{
"template": "*",
"settings": {
"index.refresh_interval": "5s"
},
"mappings": {
"_default_": {
"_all": {
"enabled": true
},
"dynamic_templates": [
{
"string_fields": {
"match": "*",
"match_mapping_type": "string",
"mapping": {
"index": "not_analyzed",
"omit_norms": true,
"type": "string"
}
}
}
],
"properties": {
"@version": {
"type": "string",
"index": "not_analyzed"
},
"geoip": {
"type": "object",
"dynamic": true,
"path": "full",
"properties": {
"location": {
"type": "geo_point"
}
}
}
}
}
}
}'
Make a copy of the lib/logstash/outputs/elasticsearch/elasticsearch-template.json from your Logstash distribution (possibly installed as /opt/logstash/lib/logstash/outputs/elasticsearch/elasticsearch-template.json), modify it by replacing
"dynamic_templates" : [ {
"string_fields" : {
"match" : "*",
"match_mapping_type" : "string",
"mapping" : {
"type" : "string", "index" : "analyzed", "omit_norms" : true,
"fields" : {
"raw" : {"type": "string", "index" : "not_analyzed", "ignore_above" : 256}
}
}
}
} ],
with
"dynamic_templates" : [ {
"string_fields" : {
"match" : "*",
"match_mapping_type" : "string",
"mapping" : {
"type" : "string", "index" : "not_analyzed", "omit_norms" : true
}
}
} ],
and point template
for you output plugin to your modified file:
output {
elasticsearch {
...
template => "/path/to/my-elasticsearch-template.json"
}
}
You can still override this default for particular fields.
I think updating the mapping is wrong approach just to handle a field for reporting purposes. Sooner or later you may want to be able to search the field for tokens. If you are updating the field to "not_analyzed" and want to search for foo from a value "foo bar", you won't be able to do that.
A more graceful solution is to use kibana aggregation filters instead of terms. Something like below will search for the terms ivr04 and ivr02. So in your case you can have a filter "Hello I'm Sinha". Hope this helps.