elasticsearch bool query combine must with OR

2019-01-08 05:28发布

问题:

I am currently trying to migrate a solr-based application to elasticsearch.

I have this lucene query

(( 
    name:(+foo +bar) 
    OR info:(+foo +bar) 
)) AND state:(1) AND (has_image:(0) OR has_image:(1)^100)

As far as I understand this is a combination of MUST clauses combined with boolean OR:

"Get all documents containing (foo AND bar in name) OR (foo AND bar in info). After that filter results by condition state=1 and boost documents that have an image."

I have been trying to use a bool query with MUST but I am failing to get boolean OR into must clauses. Here is what I have:

GET /test/object/_search
{
  "from": 0,
  "size": 20,
  "sort": {
    "_score": "desc"
  },
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "name": "foo"
          }
        },
        {
          "match": {
            "name": "bar"
          }
        }
      ],
      "must_not": [],
      "should": [
        {
          "match": {
            "has_image": {
              "query": 1,
              "boost": 100
            }
          }
        }
      ]
    }
  }
}

As you can see, MUST conditions for "info" are missing.

Does anyone have a solution?

Thank you so much.

** UPDATE **

I have updated my elasticsearch query and got rid of that function score. My base problem still exists.

回答1:

  • OR is spelled should
  • AND is spelled must
  • NOR is spelled should_not

Example:

You want to see all the items that are (round AND (red OR blue)):

{
    "query": {
        "bool": {
            "must": [
                {
                    "term": {"shape": "round"}
                },
                {
                    "bool": {
                        "should": [
                            {"term": {"color": "red"}},
                            {"term": {"color": "blue"}}
                        ]
                    }
                }
            ]
        }
    }
}

You can also do more complex versions of OR, for example if you want to match at least 3 out of 5, you can specify 5 options under "should" and set a "minimum_should" of 3.

Thanks to Glen Thompson and Sebastialonso for finding where my nesting wasn't quite right before.

Thanks also to Fatmajk for pointing out that "term" becomes "match" in ElasticSearch 6.



回答2:

I finally managed to create a query that does exactly what i wanted to have:

A filtered nested boolean query. I am not sure why this is not documented. Maybe someone here can tell me?

Here is the query:

GET /test/object/_search
{
  "from": 0,
  "size": 20,
  "sort": {
    "_score": "desc"
  },
  "query": {
    "filtered": {
      "filter": {
        "bool": {
          "must": [
            {
              "term": {
                "state": 1
              }
            }
          ]
        }
      },
      "query": {
        "bool": {
          "should": [
            {
              "bool": {
                "must": [
                  {
                    "match": {
                      "name": "foo"
                    }
                  },
                  {
                    "match": {
                      "name": "bar"
                    }
                  }
                ],
                "should": [
                  {
                    "match": {
                      "has_image": {
                        "query": 1,
                        "boost": 100
                      }
                    }
                  }
                ]
              }
            },
            {
              "bool": {
                "must": [
                  {
                    "match": {
                      "info": "foo"
                    }
                  },
                  {
                    "match": {
                      "info": "bar"
                    }
                  }
                ],
                "should": [
                  {
                    "match": {
                      "has_image": {
                        "query": 1,
                        "boost": 100
                      }
                    }
                  }
                ]
              }
            }
          ],
          "minimum_should_match": 1
        }
      }    
    }
  }
}

In pseudo-SQL:

SELECT * FROM /test/object
WHERE 
    ((name=foo AND name=bar) OR (info=foo AND info=bar))
AND state=1

Please keep in mind that it depends on your document field analysis and mappings how name=foo is internally handled. This can vary from a fuzzy to strict behavior.

"minimum_should_match": 1 says, that at least one of the should statements must be true.

This statements means that whenever there is a document in the resultset that contains has_image:1 it is boosted by factor 100. This changes result ordering.

"should": [
  {
    "match": {
      "has_image": {
        "query": 1,
        "boost": 100
      }
    }
   }
 ]

Have fun guys :)



回答3:

Using the above I get

[term] malformed query, expected [END_OBJECT] but found [FIELD_NAME]

This worked for me

Updated for Elasticsearch 5.6.4 +

{
    "query": {
        "bool": {
            "must": [
                {"term": {"shape": "round"}},
                {"bool": {
                    "should": [
                        {"term": {"color": "red"}},
                        {"term": {"color": "blue"}}
                    ]
                }}
            ]
        }
    }
}


回答4:

ElasticSearch is definitely horrible when it comes to simple queries like AND, OR or IN. But, you can go the smart way and write your query as SQL and then convert it to ElasticSearch syntax with this excellent online tool:

SQL to ElasticSearch converter

https://www.toolsbuzz.com/query-converter

You can thank me later :)



回答5:

I recently had to solve this problem too, and after a LOT of trial and error I came up with this (in PHP, but maps directly to the DSL):

'query' => [
    'bool' => [
        'should' => [
            ['prefix' => ['name_first' => $query]],
            ['prefix' => ['name_last' => $query]],
            ['prefix' => ['phone' => $query]],
            ['prefix' => ['email' => $query]],
            [
                'multi_match' => [
                    'query' => $query,
                    'type' => 'cross_fields',
                    'operator' => 'and',
                    'fields' => ['name_first', 'name_last']
                ]
            ]
        ],
        'minimum_should_match' => 1,
        'filter' => [
            ['term' => ['state' => 'active']],
            ['term' => ['company_id' => $companyId]]
        ]
    ]
]

Which maps to something like this in SQL:

SELECT * from <index> 
WHERE (
    name_first LIKE '<query>%' OR
    name_last LIKE '<query>%' OR
    phone LIKE  '<query>%' OR
    email LIKE '<query>%'
)
AND state = 'active'
AND company_id = <query>

The key in all this is the minimum_should_match setting. Without this the filter totally overrides the should.

Hope this helps someone!



回答6:

$filterQuery = $this->queryFactory->create(QueryInterface::TYPE_BOOL, ['must' => $queries,'should'=>$queriesGeo]);

In must you need to add the query condition array which you want to work with AND and in should you need to add the query condition which you want to work with OR.

You can check this: https://github.com/Smile-SA/elasticsuite/issues/972