What is the difference between a Join Query and Em

2019-07-26 04:31发布

问题:

I am trying to index data across multiple tables using Solr's Data Import Handler. The official wiki on the DIH suggests using embedded entities to link multiple tables like so:

<document>
    <entity name="item" pk="id" query="SELECT * FROM item">
        <entity name="member" pk="memberid" query="SELECT * FROM member WHERE memberid='${item.memberid}'>
        </entity>
    </entity>
</document>

Another way that works is:

<document>
    <entity name="item" pk="id" query="SELECT * FROM item INNER JOIN member ON item.memberid=member.memberid">
    </entity>
</document>

Are these two methods functionally different? Is there a performance difference? My guess would be that the first method is to support non-SQL tables, but I'm not sure.

Another though would be that, if using join tables in MySQL, using the SQL query method with multiple joins could cause multiple documents to be indexed instead of one.

回答1:

Few things that I encountered :-

  • If you have a one to one mapping you can use the join so that you get all the fields with one query itself.
  • If you have multiple records for the root you would use the sub entity which would probably create a multivalued field.
  • Sub entities fire a query for each of the records and hence are slower in performance.

Would like to hear from Other users as well.



回答2:

If you want to optimize the performance you can use the second select to create a materialized view. This would means you can import with DIH directly from the materialized view (without creating any innerjoins or subentities in DIH, witch will means a lot faster import and the source DB will not be "overheated") If you want, you can create a cronjob to fast refresh your materialized view every day or as you need. If you are using MySQL you can take a look here for details on materialize views. Hope that will help.