How can I use In clause in Hive I want to write something like this in Hive select x from y where y.z in (select distinct z from y) order by x; But I am not finding any way of doing it.. I tried In clause in Hive 0.7 it was throwing error, Also I tried Find_in_Set .. using find_in_set(y.z,subquery).. but the job is getting failed.
I want to do this on Hive. Please help me if anybody knows how to do this in Hive..
Thanks & Regards, Atul
assume
table t1(id,name)
andtable t2(id,name)
listing only those ids from
t1
that exists in t2(basicallyIN
clause)listing only those ids from
t1
that exists only int1
but not in t2(basicallyNOT IN
clause)Hive does support IN/EXISTS statements since Hive 0.13 with few limitations. Please refer to https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SubQueries for more details.
Hive 0.13 now do support IN/EXISTS in the WHERE-clause .. The issue https://issues.apache.org/jira/browse/HIVE-784 has been resolved after 4 years :)
According to https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select:
"Hive does not support IN, EXISTS or subqueries in the WHERE clause."
You might want to look at: https://issues.apache.org/jira/browse/HIVE-801
https://issues.apache.org/jira/browse/HIVE-1799
I'm using hive version
0.7.1
andSELECT * FROM MYTABLE WHERE MYCOLUMN IN ('thisThing','thatThing');
I tested this on a column type
STRING
so I am not sure if this works universally on all data types since I noticed like Wawrzyniec mentioned above that the Hive Language Manual says that it is not supported and to instead useLEFT SEMI JOIN
but it worked fine in my test.Hive supports perfectly the IN ... it does not support the subquery in the WHERE clause
there is an open feature ticket from Facebook engineers since 4 years... https://issues.apache.org/jira/browse/HIVE-784?focusedCommentId=13579059