Solr counts by category

2019-04-11 03:01发布

问题:

Let's say I have a bunch of records in a solr database in various categories: Products, Pages, etc. Is there a way to get a count so I can display total matched vs. total in category.

Something like:

You searched for "breakfast"

Pages matching breakfast: 37/500 Products matching breakfast: 7/100

and so on.

Bonus points if I can get this in some kind of structure I can loop through, ala this pseudocode:

print "You searched for %s\n" % term
for category, match_count, total_count in categories:
    print "%ss matching %s: %d/%d\n" % (category, match_count, total_count)

This is language agnostic, I plan on accessing the solr index directly using a GET request, not using any API.

回答1:

Given

Schema (3 fields, all of type string (solr.StrField)):

  • id
  • title
  • category

Input data:

  • 4 categories - Product, Page, Post, Other
  • 4 titles - breakfast, lunch, dinner, supper

Index:

  • 1000 documents with random title/category

Request

We can use faceting in order to count specific results:

  • Search query (searching for all document):

    q=*:*

  • Filter query (filtering for specific input request and marking it with tag):

    &fq={!tag=dt}title:breakfast

  • Faceting:

    • Turn on faceting

      &facet=true

    • Turn off the results if there is only a need for category information

      &rows=0

    • Get matching count

      &facet.field=category

    • Get total count (count for each category excluding provided filterQuery)

      &facet.field={!ex=dt key=total_category}category

Final query will be like this:

http://localhost:8983/solr/stack19733827/select?q=*%3A*&fq=%7B!tag%3Ddt%7Dtitle%3Abreakfast&rows=0&wt=xml&indent=true&facet=true&facet.field=category&facet.field={!ex=dt%20key=total_category}category

Result

Here a sample response which also contains the request:

<?xml version="1.0" encoding="UTF-8"?>
<response>

<lst name="responseHeader">
  <int name="status">0</int>
  <int name="QTime">1</int>
  <lst name="params">
    <str name="q">*:*</str>
    <arr name="facet.field">
      <str>category</str>
      <str>{!ex=dt key=total_category}category</str>
    </arr>
    <str name="indent">true</str>
    <str name="fq">{!tag=dt}title:breakfast</str>
    <str name="rows">0</str>
    <str name="wt">xml</str>
    <str name="facet">true</str>
    <str name="_">1383337530565</str>
  </lst>
</lst>
<result name="response" numFound="262" start="0">
</result>
<lst name="facet_counts">
  <lst name="facet_queries"/>
  <lst name="facet_fields">
    <lst name="category">
      <int name="other">75</int>
      <int name="page">65</int>
      <int name="product">62</int>
      <int name="post">60</int>
    </lst>
    <lst name="total_category">
      <int name="other">260</int>
      <int name="product">253</int>
      <int name="page">250</int>
      <int name="post">237</int>
    </lst>
  </lst>
  <lst name="facet_dates"/>
  <lst name="facet_ranges"/>
</lst>
</response>

It contains needed information in facets:

  • total_category - total number of documents in category
  • category - number of documents in category which matched the filter query
  • name of the facet - name of the category

Bonus:

  • Total match for breakfast in title - 262 in all categories


标签: solr