elasticsearch analyzer - lowercase and whitespace

2019-03-27 12:01发布

How can I create a mapping that will tokenize the string on whitespace and also change it to lowercase for indexing?

This is my current mapping that tokenizes by whitespace by I cant understand how to lowercase it and also search (query) the same...

{
  "mappings": {
    "my_type" : {
      "properties" : {
        "title" : { "type" : "string", "analyzer" : "whitespace", "tokenizer": "whitespace", "search_analyzer":"whitespace" }
      }
    }
  }
}

Please help...

2条回答
走好不送
2楼-- · 2019-03-27 12:43

You have two options -

Simple Analyser

the simple analyser will probably meet your needs:

curl -XGET 'localhost:9200/myindex/_analyze?analyzer=simple&pretty' -d 'Some DATA' 
{
  "tokens" : [ {
    "token" : "some",
    "start_offset" : 0,
    "end_offset" : 4,
    "type" : "word",
    "position" : 1
  }, {
    "token" : "data",
    "start_offset" : 5,
    "end_offset" : 9,
    "type" : "word",
    "position" : 2
  } ]
}

To use the simple analyser in your mapping:

{
 "mappings": {
   "my_type" : {
      "properties" : {
        "title" : { "type" : "string", "analyzer" : "simple"}
      }
    }
  }
}

Custom Analyser

Second option is to define your own custom analyser and specify how to tokenise and filter the data. Then refer to this new analyser in your mapping.

查看更多
倾城 Initia
3楼-- · 2019-03-27 12:44

i managed to write a custom analyzer and this works...

"settings":{
  "analysis": {
    "analyzer": {
      "lowercasespaceanalyzer": {
        "type": "custom",
        "tokenizer": "whitespace",
        "filter": [
          "lowercase"
        ]
      }
    }
  }
},


"mappings": {
 "my_type" : {
  "properties" : {
    "title" : { "type" : "string", "analyzer" : "lowercasespaceanalyzer", "tokenizer": "whitespace", "search_analyzer":"whitespace", "filter": [
      "lowercase"
    ] }
  }
 }
}
查看更多
登录 后发表回答