Elasticsearch regexp with space not working

2020-04-19 08:09发布

Let's assume I have books with titles indexed with ElasticSearch as following:

curl -XPUT "http://localhost:9200/_river/books/_meta" -d'
{
"type": "jdbc",
"jdbc": {
"driver": "org.postgresql.Driver",
"url": "jdbc:postgresql://localhost:5432/...",
"user": "...",
"password": "...",
"index": "books",
"type": "books",
"sql": "SELECT * FROM books"}

}'

For instance, I have a book called "Afoo barb".

The following code (searching for '.*foo.*') returns well the book:

client.search({
  index: 'books',
  'from': 0,
  'size': 10,
  'body' : {
    'query': {
      'filtered': {
         'filter': {
           'bool': {
              'must': {
                'regexp': { title: '.*foo.*' }
               }
            }
          }
        }
     }
  }
});

But the following code (searching for '.*foo bar.*') does not:

client.search({
  index: 'books',
  'from': 0,
  'size': 10,
  'body' : {
    'query': {
      'filtered': {
         'filter': {
           'bool': {
              'must': {
                'regexp': { title: '.*foo bar.*' }
               }
            }
          }
        }
     }
  }
});

I tried to replace the space by '\s' or '.*' but it does not work either.

I think the title is separated in terms (['Afoo', 'barb']) so it can't find '.*foo bar.*'.

How can I ask Elasticsearch to search the regexp in the complete title ?

1条回答
▲ chillily
2楼-- · 2020-04-19 09:10

Elasticsearch will apply the regexp to the terms produced by the tokenizer for that field, and not to the original text of the field.

You can use different tokenizer for indexing your fields or define the regex in such a way that it returns required documents with high score.

Example with keyword tokenizer:

'regexp': { title: '*(foo bar)*' }
查看更多
登录 后发表回答