I've been trying to figure out a way to implement faceting with hierarchies in solr and can't figure out how to do it in my situation. I've read a couple of the articles on doing hierarchies in solr along with the solutions in patch 64 and 792. The main issue I'm having is that I have entities that can belong to multiple branches of the hierarchy. The current form of my data is a user document with MVAs for country, state, and city.
Take for instance a geographical hierarchy that people can list their services for down to a city level. A person may service all of alabama but only certain towns in georgia. Now the faceting count for the state level counts the distinct individuals that service an area which is a 1 for alabama and a 1 for georgia and when expanded down to the city level has a count for each city (in other words the sum of the city counts won't necessarily equal the state count since the counts are not mutually exclusive).
US(1)
Georgia(1)
Atlanta(1)
Columbus(0)
Athens(0)
Alabama(1)
Mobile(1)
Birmingham(1)
Huntsville(1)
The part I'm getting hung up on is when faceting on the cities I have no way of knowing what state they belong to since the user is listed in both alabama and georgia and I can't figure out a way to relate attributes to each other. solr-64 might work if it supports multiple branches like US/Alabama/Mobile/ and US/Georgia/Atlanta/ for the same document. As of right now I havent been able to get it to compile with the nightly build though so I'm kind of stuck.
Does anyone have a better way of doing this?
See the first use case described here. (client side processing for indexing and querying necessary!)
Category navigation
The problem: you have a tree of categories and your products are categorized in multiple of those categories.
There are two relative similar solutions for this problem. I will describe one of them:
- Create a multivalued string field called ‘category’. Use the category id (or name if you want to avoid DB queries).
- You have a category tree. Make sure a document gets not only the leaf category, but all categories until the root node.
- Now facet over the category field with ‘-1′ as limit
But what if you want to display only the categories of one level? E.g. if you don’t want other level at a time or if they are too much.
Then index the category field ala <level>_category
. For that you will need the complete category tree in RAM while indexing. Then use facet.prefix=<level>_
to filter the category list for the level
- Clicking on a category entry should result in a filter query ala
fq=category:”<level>_categoryId”
- The little tricky part is now that your UI or middle tier has to parse the level e.g. 2 and the append 2+1=3 to the query: facet.prefix=3_
If you filter the level then one question remains:
Q: how can you display the path from the selected category until the root category?
A: Either get the category parents via DB, which is easy if you store the category ids in Solr – not the category names.
Or get the parents from the parameter list which is a bit more complicated but doable. In this case you’ll need to store the category names in Solr.
I am not that familiar with your problem but it seems you need to do a group-by city,state.
Have a look at the group-by feature in SOLR called field collapsing (http://wiki.apache.org/solr/FieldCollapsing).
Also, have a look at bobo-browse as well. Specifically, compositeFacetHandlers http://code.google.com/p/bobo-browse/wiki/CompositeFacetHandlers. bobo-browse can be integrated into SOLR (http://code.google.com/p/bobo-browse/wiki/SolrIntegration)
Check out Pivot (i.e. Decision Tree) Faceting: http://wiki.apache.org/solr/SimpleFacetParameters#Pivot_.28ie_Decision_Tree.29_Faceting
It is supported in Solr 4.0
Assuming your documents in the index represent a single service....
For the city, manufacture a field that is the state concatenated with the city using a delimiter of some sort. This field doesn't have to ever be displayed to the user, it can be in addition to a field you store but don't index that is the real name of the city.
For example you could have a city_facet field with values of:
- "Ohio - Miami"
- "Florida - Miami"
You probably want to pick a delimiter that is safer. I imagine a hyphen could be a potential conflict.