Non equi Left outer join in hive workaround

2019-09-02 07:29发布

How to implement non-equi left outer join in hive involving more than two tables?

Query Used:

SELECT cs.SID, ins.ID, a.ID, e.id, i.id, cs.dateId, cs.timeId,cs.SSTRT,cs.SEND,cs.VAL,cs.IND,cs.TYP,cs.DTPE,cs.BCDE, cs.IVAL,cs.ICNT,cs.RDT,cs.REJ 
from cs_item_hive cs 
LEFT outer JOIN installation_hive ins ON  (cs.SID=ins.SN)
LEFT OUTER JOIN account_hive a ON (ins.AID=a.ID AND ins.MDID = a.MDID)
LEFT OUTER JOIN equipment_hive e ON ins.GBLSID=e.GSN
LEFT OUTER JOIN item_hive i ON (cs.BCDE=i.ibc AND ins.MDID = i.MDID AND ins.AID = i.AID)
where cs.SEND >= ins.IDAT AND cs.SEND < ins.RDAT;

Issue:

Records that are not matching WHERE condition are skipped. [As, Hive does not support non-equi join, where condition is used]

Requirement:

Records that are not matching the condition should have the ins.ID, a.ID, e.id, i.id substituted as NULL [as like equi-join].

How to implement something like this in hive:

SELECT cs.SID, ins.ID, a.ID, e.id, i.id, cs.dateId, cs.timeId,cs.SSTRT,cs.SEND,cs.VAL,cs.IND,cs.TYP,cs.DTPE,cs.BCDE, cs.IVAL,cs.ICNT,cs.RDT,cs.REJ 
from cs_item_hive cs 
LEFT outer JOIN installation_hive ins ON  (cs.SID=ins.SN AND cs.SEND >= ins.IDAT AND cs.SEND < ins.RDAT)
LEFT OUTER JOIN account_hive a ON (ins.AID=a.ID AND ins.MDID = a.MDID)
LEFT OUTER JOIN equipment_hive e ON ins.GBLSID=e.GSN
LEFT OUTER JOIN item_hive i ON (cs.BCDE=i.ibc AND ins.MDID = i.MDID AND ins.AID = i.AID);

Please guide me to resolve this issue.

Thankyou.

1条回答
聊天终结者
2楼-- · 2019-09-02 07:56

You can come pretty close with:

where (cs.SEND >= ins.IDAT AND cs.SEND < ins.RDAT) or
      (ins.IDAT is null)

Alternatively, I think you can use two queries, one for matches and one for non-matches.

查看更多
登录 后发表回答