How do I do a partial match in Elasticsearch?

I have a link like http://drive.google.com and I want to match "google" out of the link.

I have:

query: {
    bool : {
        must: {
            match: { text: 'google'} 
        }
    }
}

But this only matches if the whole text is 'google' (case insensitive, so it also matches Google or GooGlE etc). How do I match for the 'google' inside of another string?

标签： json regex parsing url elasticsearch

6条回答

Bombasti

2楼-- · 2019-04-04 15:17

For both partial and full text matching ,the following worked

"query" : {
    "query_string" : {
      "query" : "*searchText*",
      "fields" : [
        "fieldName"
      ]
    }

0人赞添加讨论(0) 举报

爷、活的狠高调

3楼-- · 2019-04-04 15:18

For a more generic solution you can look into using a different analyzer or defining your own. I am assuming you are using the standard analyzer which would split http://drive.google.com into the tokens "http" and "drive.google.com". This is why the search for just google isn't working because it is trying to compare it to the full "drive.google.com".

If instead you indexed your documents using the simple analyzer it would split it up into "http", "drive", "google", and "com". This will allow you to match anyone of those terms on their own.

0人赞添加讨论(0) 举报

Fickle 薄情

4楼-- · 2019-04-04 15:24

I can't find a breaking change disabling regular expressions in match, but match: { text: '.*google.*'} does not work on any of my Elasticsearch 6.2 clusters. Perhaps it is configurable?

Regexp works:

"query": {
   "regexp": { "text": ".*google.*"} 
}

0人赞添加讨论(0) 举报

贼婆χ

5楼-- · 2019-04-04 15:29

use wildcard query:

'{"query":{ "wildcard": { "text.keyword" : "*google*" }}}'

0人赞添加讨论(0) 举报

神经病院院长

6楼-- · 2019-04-04 15:29

For partial matching you can either use prefix or match_phrase_prefix.

0人赞添加讨论(0) 举报

Lonely孤独者°

7楼-- · 2019-04-04 15:42

The point is that the ElasticSearch regex you are using requires a full string match:

Lucene’s patterns are always anchored. The pattern provided must match the entire string.

Thus, to match any character (but a newline), you can use .* pattern:

match: { text: '.*google.*'}
                ^^      ^^

One more variation is for cases when your string can have newlines: match: { text: '(.|\n)*google(.|\n)*'}. This awful (.|\n)* is a must in ElasticSearch because this regex flavor does not allow any [\s\S] workarounds, nor any DOTALL/Singleline flags. "The Lucene regular expression engine is not Perl-compatible but supports a smaller range of operators."

0人赞添加讨论(0) 举报

How do I do a partial match in Elasticsearch?

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间