solr JOIN query

2019-04-15 12:36发布

问题:

I need to run a JOIN query on a solr index. I've got two xmls that I have indexed, person.xml and subject.xml.

Person:

<doc>
<field name="id">P39126</field>
<field name="family">Smith</field>
<field name="given">John</field>
<field name="subject">S1276</field>
<field name="subject">S1312</field>
</doc>

Subject:

<doc>
<field name="id">S1276</field>
<field name="topic">Abnormalities, Human</field>
</doc>

I need to only display information from the person doc but each query should match fields in both person and subject. In the case the query matches only the subject doc I need to display all docs from the person that have a matching id. Is this possible to do without running two seperate queries? Something like a JOIN query would do the job.

Any help?

回答1:

I do not think it is possible to do what you are asking with a single query using your schema.

One thing that you should keep in mind is to always think of Solr indexes as single denormalized tables. This is sometimes a challenge and there may be times where you must be forced to use different indexes for each kind of data.

For your problem, maybe having a schema like this one might help:

<doc>
 <field name="id">P39126</field>
 <field name="family">Smith</field>
 <field name="given">John</field>
 <field name="topic">Abnormalities, Human</field> <!-- subject S1276 -->
 <field name="topic">some, other, topics</field> <!-- subject S1312 -->
</doc>

Running a query for some topics with this schema would return all person having those topics.

Some links that might interest you:

  • http://www.lucidimagination.com/search/document/93e8b09e90b0076c/help_with_denormalizing_issues#60890dcb99a3004d
  • http://wiki.apache.org/solr/SchemaDesign


回答2:

It appears a nice Join implementation might arrive soon: https://issues.apache.org/jira/secure/attachment/12465770/SOLR-2272.patch



回答3:

If you can't denormalize as suggested by Pascal, you could write your own query handler to do the join: first issue a query on the requested topics that requests the id field of matching documents, then issue a BooleanQuery containing one clause for each id (a TermQuery on subject = id). This will have pretty poor performance if there are a large number of id's, but should be fine if there are just a few matching ids.

If you anticipate that your "join" queries will generally match a lot (say hundreds) of subjects, then you're probably better off denormalizing as suggested.

I don't know the most elegant way to issue a query from a handler, but FWIW here's how I do it.

Map args = new HashMap();
// add your query parameters to the map, like fields to return
args.put("fl", new String[]{"id"});
final SolrIndexSearcher searcher = req.getSearcher();
String query = "your query"
LocalSolrQueryRequest newReq = new LocalSolrQueryRequest(core, query, "", 0, 0, args) {
  @Override public SolrIndexSearcher getSearcher() { return searcher; }
  @Override public void close() { }
};
SolrQueryResponse newRsp = new SolrQueryResponse();
core.execute(core.getRequestHandler(newReq.getParams().get(CommonParams.QT)), newReq, newRsp);
// query results will be in newRsp


标签: join solr