多领域，多字，不匹配QUERY_STRING(Multi-field, multi-word, ma

我想能够匹配对多个领域，每一个搜索词包含在任何领域中，任意组合多词搜索。美中不足的是，我想尽量避免使用 QUERY_STRING。

curl -X POST "http://localhost:9200/index/document/1" -d '{"id":1,"firstname":"john","middlename":"clark","lastname":"smith"}'
curl -X POST "http://localhost:9200/index/document/2" -d '{"id":2,"firstname":"john","middlename":"paladini","lastname":"miranda"}'

我想为“约翰·史密斯”搜索只匹配文件1.下面的查询做什么，我需要，但我宁愿避免使用QUERY_STRING的情况下，用户通过“OR”，“AND”和任何其他高级则params的。

curl -X GET 'http://localhost:9200/index/_search?per_page=10&pretty' -d '{
  "query": {
    "query_string": {
      "query": "john smith",
      "default_operator": "AND",
      "fields": [
        "firstname",
        "lastname",
        "middlename"
      ]
    }
  }
}'

Answer 1:

你要找的是多匹配查询，但它并没有在挺你会喜欢的方式进行。

比较器的输出验证了multi_match VS query_string 。

multi_match （与运营商and ）将确保至少在一个领域中存在的所有方面：

curl -XGET 'http://127.0.0.1:9200/_validate/query?pretty=1&explain=true'  -d '
{
   "multi_match" : {
      "operator" : "and",
      "fields" : [
         "firstname",
         "lastname"
      ],
      "query" : "john smith"
   }
}
'

# {
#    "_shards" : {
#       "failed" : 0,
#       "successful" : 1,
#       "total" : 1
#    },
#    "explanations" : [
#       {
#          "index" : "test",
#          "explanation" : "((+lastname:john +lastname:smith) | (+firstname:john +firstname:smith))",
#          "valid" : true
#       }
#    ],
#    "valid" : true
# }

虽然query_string （与default_operator AND ）将检查每学期至少在一个字段存在：

curl -XGET 'http://127.0.0.1:9200/_validate/query?pretty=1&explain=true'  -d '
{
   "query_string" : {
      "fields" : [
         "firstname",
         "lastname"
      ],
      "query" : "john smith",
      "default_operator" : "AND"
   }
}
'

# {
#    "_shards" : {
#       "failed" : 0,
#       "successful" : 1,
#       "total" : 1
#    },
#    "explanations" : [
#       {
#          "index" : "test",
#          "explanation" : "+(firstname:john | lastname:john) +(firstname:smith | lastname:smith)",
#          "valid" : true
#       }
#    ],
#    "valid" : true
# }

所以，你有几个选择，以达到你所追求的：

Preparse搜索词，删除之类的通配符等，使用前query_string
Preparse搜索词条中以提取每个字，然后生成一个multi_match每字查询
使用index_name在你的映射名称字段索引他们的数据到一个单一的领域，然后你就可以使用搜索。（如您的自定义all字段）：

如下：

curl -XPUT 'http://127.0.0.1:9200/test/?pretty=1'  -d '
{
   "mappings" : {
      "test" : {
         "properties" : {
            "firstname" : {
               "index_name" : "name",
               "type" : "string"
            },
            "lastname" : {
               "index_name" : "name",
               "type" : "string"
            }
         }
      }
   }
}
'

curl -XPOST 'http://127.0.0.1:9200/test/test?pretty=1'  -d '
{
   "firstname" : "john",
   "lastname" : "smith"
}
'

curl -XGET 'http://127.0.0.1:9200/test/test/_search?pretty=1'  -d '
{
   "query" : {
      "match" : {
         "name" : {
            "operator" : "and",
            "query" : "john smith"
         }
      }
   }
}
'

# {
#    "hits" : {
#       "hits" : [
#          {
#             "_source" : {
#                "firstname" : "john",
#                "lastname" : "smith"
#             },
#             "_score" : 0.2712221,
#             "_index" : "test",
#             "_id" : "VJFU_RWbRNaeHF9wNM8fRA",
#             "_type" : "test"
#          }
#       ],
#       "max_score" : 0.2712221,
#       "total" : 1
#    },
#    "timed_out" : false,
#    "_shards" : {
#       "failed" : 0,
#       "successful" : 5,
#       "total" : 5
#    },
#    "took" : 33
# }

但是请注意， firstname和lastname不再能够搜索独立。对于这两个字段的数据被索引到name 。

您可以使用多领域与path参数，使他们既搜索独立和在一起，如下所示：

curl -XPUT 'http://127.0.0.1:9200/test/?pretty=1'  -d '
{
   "mappings" : {
      "test" : {
         "properties" : {
            "firstname" : {
               "fields" : {
                  "firstname" : {
                     "type" : "string"
                  },
                  "any_name" : {
                     "type" : "string"
                  }
               },
               "path" : "just_name",
               "type" : "multi_field"
            },
            "lastname" : {
               "fields" : {
                  "any_name" : {
                     "type" : "string"
                  },
                  "lastname" : {
                     "type" : "string"
                  }
               },
               "path" : "just_name",
               "type" : "multi_field"
            }
         }
      }
   }
}
'

curl -XPOST 'http://127.0.0.1:9200/test/test?pretty=1'  -d '
{
   "firstname" : "john",
   "lastname" : "smith"
}
'

搜索any_name领域的工作：

curl -XGET 'http://127.0.0.1:9200/test/test/_search?pretty=1'  -d '
{
   "query" : {
      "match" : {
         "any_name" : {
            "operator" : "and",
            "query" : "john smith"
         }
      }
   }
}
'

# {
#    "hits" : {
#       "hits" : [
#          {
#             "_source" : {
#                "firstname" : "john",
#                "lastname" : "smith"
#             },
#             "_score" : 0.2712221,
#             "_index" : "test",
#             "_id" : "Xf9qqKt0TpCuyLWioNh-iQ",
#             "_type" : "test"
#          }
#       ],
#       "max_score" : 0.2712221,
#       "total" : 1
#    },
#    "timed_out" : false,
#    "_shards" : {
#       "failed" : 0,
#       "successful" : 5,
#       "total" : 5
#    },
#    "took" : 11
# }

搜索firstname为john AND smith不工作：

curl -XGET 'http://127.0.0.1:9200/test/test/_search?pretty=1'  -d '
{
   "query" : {
      "match" : {
         "firstname" : {
            "operator" : "and",
            "query" : "john smith"
         }
      }
   }
}
'

# {
#    "hits" : {
#       "hits" : [],
#       "max_score" : null,
#       "total" : 0
#    },
#    "timed_out" : false,
#    "_shards" : {
#       "failed" : 0,
#       "successful" : 5,
#       "total" : 5
#    },
#    "took" : 2
# }

但搜索firstname只是john正常工作：

curl -XGET 'http://127.0.0.1:9200/test/test/_search?pretty=1'  -d '
{
   "query" : {
      "match" : {
         "firstname" : {
            "operator" : "and",
            "query" : "john"
         }
      }
   }
}
'

# {
#    "hits" : {
#       "hits" : [
#          {
#             "_source" : {
#                "firstname" : "john",
#                "lastname" : "smith"
#             },
#             "_score" : 0.30685282,
#             "_index" : "test",
#             "_id" : "Xf9qqKt0TpCuyLWioNh-iQ",
#             "_type" : "test"
#          }
#       ],
#       "max_score" : 0.30685282,
#       "total" : 1
#    },
#    "timed_out" : false,
#    "_shards" : {
#       "failed" : 0,
#       "successful" : 5,
#       "total" : 5
#    },
#    "took" : 3
# }

Answer 2:

我宁愿避免使用的情况下，QUERY_STRING用户通过“OR”，“AND”和任何其他高级则params的。

根据我的经验，用反斜杠的特殊字符是一个简单而有效的解决方案。该列表可以在文档中找到http://lucene.apache.org/core/4_5_0/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#package_description ，加上AND / OR / NOT / TO 。

Answer 3:

我认为“匹配”的查询是你在找什么：

查询解析“本场比赛的家人查询不通过走‘’的过程。它不支持的字段名称的前缀，通配符，或其他‘超前’的功能。由于这个原因，它失败的几率是非常小的/不存在的，它提供了一个很好的行为，当谈到刚刚分析和运行文本作为查询行为（通常是一个文本搜索框做什么）”

http://www.elasticsearch.org/guide/reference/query-dsl/match-query.html

Answer 4:

现在你可以使用cross_fields型multi_match

GET /_validate/query?explain
{
    "query": {
        "multi_match": {
            "query":       "peter smith",
            "type":        "cross_fields", 
            "operator":    "and",
            "fields":      [ "firstname", "lastname", "middlename" ]
        }
    }
}

交叉领域采取长期为中心的方法。它把所有的字段作为一个大的领域，并会在任何领域每学期。

但有一两件事要注意的是，如果你想让它最佳状态工作，分析了所有字段应该具有相同的分析仪（标准，英语等）：

对于cross_fields查询类型，以最佳状态工作，所有的字段应该具有相同的分析仪。共享分析器字段混合字段分组在一起。
如果包括与不同的分析链领域，它们将被添加到查询以同样的方式作为best_fields。例如，如果我们添加的标题字段到前面的查询（假设它使用一个不同分析仪），说明将是如下：
（+标题：彼得+标题：史密斯）（+混合（ “Peter” 的字段：名字，姓氏]）+共混（ “SMITH” 字段：名字，姓氏））

文章来源: Multi-field, multi-word, match without query_string