apache solr : sum of data resulted from group by

2019-06-18 05:26发布

问题:

We have a requirement where we need to group our records by a particular field and take the sum of a corresponding numeric field

e.x. select userid, sum(click_count) from user_action group by userid;

We are trying to do this using apache solr and found that there were 2 ways of doing this:

  1. Using the field collapsing feature (http://blog.jteam.nl/2009/10/20/result-grouping-field-collapsing-with-solr/) but found 2 problems with this: 1.1. This is not part of release and is available as patch so we are not sure if we can use this in production. 1.2. We do not get the sum back but individual counts and we need to sum it at the client side.

  2. Using the Stats Component along with faceted search (http://wiki.apache.org/solr/StatsComponent). This meets our requirement but it is not fast enough for very large data sets.

I just wanted to know if anybody knows of any other way to achieve this. Appreciate any help.

Thanks,

Terance.

回答1:

Why instead don't you use the StatsComponent? - Available from Solr 1.4 up.

$ curl 'http://search/select?q=*&rows=0&stats=on&stats.field=click_count' |
  tidy -xml -indent -quiet -wrap 2000000

<?xml version="1.0" encoding="utf-8"?>
<response>
  <lst name="responseHeader">
    <int name="status">0</int>
    <int name="QTime">17</int>
    <lst name="params">
      <str name="q">*</str>
      <str name="stats">on</str>
      <arr name="stats.field">
        <str>click_count</str>
      </arr>
      <str name="rows">0</str>
    </lst>
  </lst>
  <result name="response" numFound="577" start="0" />
  <lst name="stats">
    <lst name="stats_fields">
      <lst name="click_count">
        <double name="min">1.0</double>
        <double name="max">3487.0</double>
        <double name="sum">47912.0</double>
        <long name="count">577</long>
        <long name="missing">0</long>
        <double name="sumOfSquares">4.0208702E7</double>
        <double name="mean">83.0363951473137</double>
        <double name="stddev">250.79824725438448</double>
      </lst>
    </lst>
  </lst>
</response>


标签: lucene solr