I am trying to perform search result aggregation (count and sum) grouping by several fields in a nested fashion.
For example, with the schema shown at the end of this post, I'd like to be able to get the sum of "size" grouped by "category" and sub-grouped further by "subcategory" and get something like this:
<category name="X">
<subcategory name="X_A">
<size sum="..." />
</subcategory>
<subcategory name="X_B">
<size sum="..." />
</subcategory>
</category>
....
I've been looking primarily at Solr's Stats component which, as far as I can see, doesn't allow nested aggregation.
I'd appreciate it if anyone knows of some way to implement this, with or without the Stats component.
Here is a cut-down version of the target schema:
<types>
<fieldType name="string" class="solr.StrField" />
<fieldType name="text" class="solr.TextField">
<analyzer><tokenizer class="solr.StandardTokenizerFactory" /></analyzer>
</fieldType>
<fieldType name="date" class="solr.DateField" />
<fieldType name="int" class="solr.TrieIntField" precisionStep="0" omitNorms="true" positionIncrementGap="0" />
</types>
<fields>
<field name="id" type="string" indexed="true" stored="true" />
<field name="category" type="text" indexed="true" stored="true" />
<field name="subcategory" type="text" indexed="true" stored="true" />
<field name="pdate" type="date" indexed="true" stored="true" />
<field name="size" type="int" indexed="true" stored="true" />
</fields>
The new faceting module in Solr 5.1 can do this, it was added in https://issues.apache.org/jira/browse/SOLR-7214
Here is how you would add sum(size) to every facet bucket, and sort descending by that statistic.
json.facet={
categories:{terms:{
field:category,
sort:"total_size desc", // this will sort the facet buckets by your stat
facet:{
total_size:"sum(size)" // this calculates the stat per bucket
}
}}
}
And this is how you would add in the subfacet on subcategory:
json.facet={
categories:{terms:{
field:category,
sort:"total_size desc",
facet:{
total_size:"sum(size)",
subcat:{terms:{ // this will facet on the subcategory field for each bucket
field:subcategory,
facet:{
sz:"sum(size)" // this calculates the sum per sub-cat bucket
}}
}
}}
}
So the above will give you the sum(size) at both the category and subcategory levels. Documentation for the new facet module is currently at http://yonik.com/json-facet-api/
There is a patch SOLR-3583, which adds percentiles and averages to facets, pivot facets, and distributed pivot facets by making use of range facet internals. It is possible to add sums to pivot facets by improving this patch.
For example, averages can be calculated for categories using this url:
http://localhost:8983/solr/select?q=*%3A*
&facet=true
&facet.pivot=category,subcategory
&facet.stats.percentiles=true
&facet.stats.percentiles.averages=true
&facet.stats.percentiles.field=size
&f.size.stats.percentiles.requested=25,50,75
&f.size.stats.percentiles.lower.fence=0
&f.size.stats.percentiles.upper.fence=1000
&f.size.stats.percentiles.gap=10
See also this video and slides for more details.
1. Counts
To get the counts, you can use Pivot Faceting. It will generate a list very similar to what you asked but with counts only.
You'll need to append this to your query:
&facet=true&facet.pivot=category,subcategory
Note that this works on Solr 4.0 and after.
2. Sums
As for the sums, I think you can achieve them with ordinary facets but using a facet query instead of facet field.. I'm not entirely sure of this one, I'll try it out and re-post if found anything useful.