JOINS in Lucene-第2页回答

Is there any way to implement JOINS in Lucene?

标签： join lucene

8条回答

2楼-- · 2019-03-28 07:29

You can do a generic join by hand - run two searches, get all results (instead of top N), sort them on your join key and intersect two ordered lists. But that's gonna thrash your heap real hard (if the lists even fit in it).

There are possible optimizations, but under very specific conditions.
I.e. - you do a self-join, and only use (random access) Filters for filtering, no Queries. Then you can manually iterate terms on your two join fields (in parallel), intersect docId lists for each term, filter them - and here's your join.

There's an approach handling a popular use-case of simple parent-child relationships with relatively small numer of children per-document - https://issues.apache.org/jira/browse/LUCENE-2454
Unlike the flattening method mentioned by @ntziolis, this approach correctly handles cases like: have a number of resumes, each with multiple work_experience children, and try finding someone who worked at company NNN in year YYY. If simply flattened, you'll get back resumes for people that worked for NNN in any year & worked somewhere in year YYY.

An alternative for handling simple parent-child cases is to flatten your doc, indeed, but ensure values for different children are separated by a big posIncrement gap, and then use SpanNear query to prevent your several subqueries from matching across children. There was a few-years old LinkedIn presentation about this, but I failed to find it.

0人赞添加讨论(0) 举报

爱情/是我丢掉的垃圾

3楼-- · 2019-03-28 07:33

Here is an example Numere provides an easy way to extract analytical data from Lucene indexes

select a.type, sum(a.value) as "sales", b.category, count(distinct b.product_id) as "total"
from a (index)
inner join b (index) on (a.seq_id = b.seq_id)
group by a.type, b.category
order by a.type asc, b.category asc


    Join join = RequestFactory.newJoin();

    // inner join a.seq_id = b.seq_id

    join.on("seq_id", Type.INTEGER).equal("seq_id", Type.INTEGER);

    // left
    {
        Request left = join.left();
        left.repository(UtilTest.getPath("indexes/md/master"));
        left.addColumn("type").textType().asc();
        left.addMeasure("value").alias("sales").intType().sum();
    }

    // right
    {
        Request right = join.right();
        right.repository(UtilTest.getPath("indexes/md/detail"));
        right.addColumn("category").textType().asc();
        right.addMeasure("product_id").intType().alias("total").count_distinct();
    }

    Processor processor = ProcessorFactory.newProcessor();
    try {
        ResultPacket result = processor.execute(join);
        System.out.println(result);
    } finally {
        processor.close();
    }

Result:

<?xml version='1.0' encoding='UTF-8' standalone='yes' ?>
<DATAPACKET Version="2.0">
  <METADATA>
    <FIELDS>
      <FIELD attrname="type" fieldtype="string" WIDTH="20" />
      <FIELD attrname="category" fieldtype="string" WIDTH="20" />
      <FIELD attrname="sales" fieldtype="i8" />
      <FIELD attrname="total" fieldtype="i4" />
    </FIELDS>
    <PARAMS />
  </METADATA>
  <ROWDATA>
    <ROW type="Book" category="stand" sales="127003304" total="2" />
    <ROW type="Computer" category="eletronic" sales="44765715835" total="896" />
    <ROW type="Meat" category="food" sales="3193526428" total="110" />

... continue

0人赞添加讨论(0) 举报

上一页 1 2

JOINS in Lucene

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间