Hive Query- Joining two tables on three joining co

2019-02-08 21:24发布

问题:

I am facing an error

"FAILED: Error in semantic analysis: Line 1:101 OR not supported in JOIN currently dob"

while running the below mentioned query..

Insert Overwrite Local Directory './Insurance_Risk/Merged_Data' Select f.name,s.age,f.gender,f.loc,f.marital_status,f.habits1,f.habits2,s.employement_status,s.occupation_class,s.occupation_subclass,s.occupation from sample_member_detail s Join fb_member_detail f 
On s.email=f.email or 
s.dob=f.dob 
or (f.name=s.name and f.loc = s.loc and f.occupation=s.occupation)
where s.email is not null and f.email is not null;

Can anyone tell me that, in hive "OR" operator can be used or not? if not, then what should be the query which will give the same result as given by the above mentioned query. I have 2 tables and I want to join the two tables on any one of the three conditions with or operator. Please help..

回答1:

Sorry Hive supports only equi-joins. You can always try select from full Cartesian product of those tables(you have to be in non-strict mode):

Select f.name,s.age,f.gender,f.loc,f.marital_status,f.habits1,f.habits2,s.employement_status,s.occupation_class,s.occupation_subclass,s.occupation 
from sample_member_detail s join fb_member_detail f 
where (s.email=f.email 
or s.dob=f.dob 
or (f.name=s.name and f.loc = s.loc and f.occupation=s.occupation))
and s.email is not null and f.email is not null;


回答2:

You could also use UNION to get the same results:

INSERT OVERWRITE LOCAL DIRECTORY './Insurance_Risk/Merged_Data' 
-- You can only UNION on subqueries
SELECT * FROM (
    SELECT f.name,
        s.age,
        f.gender,
        f.loc,
        f.marital_status,
        f.habits1,
        f.habits2,
        s.employement_status,
        s.occupation_class,
        s.occupation_subclass,
        s.occupation 
    FROM sample_member_detail s 
    JOIN fb_member_detail f 
    ON s.email=f.email 
    WHERE s.email IS NOT NULL AND f.email IS NOT NULL;

    UNION

    SELECT f.name,
        s.age,
        f.gender,
        f.loc,
        f.marital_status,
        f.habits1,
        f.habits2,
        s.employement_status,
        s.occupation_class,
        s.occupation_subclass,
        s.occupation 
    FROM sample_member_detail s 
    JOIN fb_member_detail f 
    ON s.dob=f.dob
    WHERE s.email IS NOT NULL AND f.email IS NOT NULL;

    UNION

    SELECT f.name,
        s.age,
        f.gender,
        f.loc,
        f.marital_status,
        f.habits1,
        f.habits2,
        s.employement_status,
        s.occupation_class,
        s.occupation_subclass,
        s.occupation 
    FROM sample_member_detail s 
    JOIN fb_member_detail f 
    ON f.name=s.name AND f.loc = s.loc AND f.occupation=s.occupation
    WHERE s.email IS NOT NULL AND f.email IS NOT NULL;

) subquery;


标签: hive