I need to run a JOIN query on a solr index. I've got two xmls that I have indexed, person.xml and subject.xml.
Person:
<doc>
<field name="id">P39126</field>
<field name="family">Smith</field>
<field name="given">John</field>
<field name="subject">S1276</field>
<field name="subject">S1312</field>
</doc>
Subject:
<doc>
<field name="id">S1276</field>
<field name="topic">Abnormalities, Human</field>
</doc>
I need to only display information from the person doc but each query should match fields in both person and subject. In the case the query matches only the subject doc I need to display all docs from the person that have a matching id. Is this possible to do without running two seperate queries? Something like a JOIN query would do the job.
Any help?
I do not think it is possible to do what you are asking with a single query using your schema.
One thing that you should keep in mind is to always think of Solr indexes as single denormalized tables. This is sometimes a challenge and there may be times where you must be forced to use different indexes for each kind of data.
For your problem, maybe having a schema like this one might help:
<doc>
<field name="id">P39126</field>
<field name="family">Smith</field>
<field name="given">John</field>
<field name="topic">Abnormalities, Human</field> <!-- subject S1276 -->
<field name="topic">some, other, topics</field> <!-- subject S1312 -->
</doc>
Running a query for some topics with this schema would return all person having those topics.
Some links that might interest you:
- http://www.lucidimagination.com/search/document/93e8b09e90b0076c/help_with_denormalizing_issues#60890dcb99a3004d
- http://wiki.apache.org/solr/SchemaDesign
It appears a nice Join implementation might arrive soon:
https://issues.apache.org/jira/secure/attachment/12465770/SOLR-2272.patch
If you can't denormalize as suggested by Pascal, you could write your own query handler to do the join: first issue a query on the requested topics that requests the id field of matching documents, then issue a BooleanQuery containing one clause for each id (a TermQuery on subject = id). This will have pretty poor performance if there are a large number of id's, but should be fine if there are just a few matching ids.
If you anticipate that your "join" queries will generally match a lot (say hundreds) of subjects, then you're probably better off denormalizing as suggested.
I don't know the most elegant way to issue a query from a handler, but FWIW here's how I do it.
Map args = new HashMap();
// add your query parameters to the map, like fields to return
args.put("fl", new String[]{"id"});
final SolrIndexSearcher searcher = req.getSearcher();
String query = "your query"
LocalSolrQueryRequest newReq = new LocalSolrQueryRequest(core, query, "", 0, 0, args) {
@Override public SolrIndexSearcher getSearcher() { return searcher; }
@Override public void close() { }
};
SolrQueryResponse newRsp = new SolrQueryResponse();
core.execute(core.getRequestHandler(newReq.getParams().get(CommonParams.QT)), newReq, newRsp);
// query results will be in newRsp