ARRAY_CONTAINS muliple values in hive

2019-04-12 02:17发布

问题:

Is there a convenient way to use the ARRAY_CONTAINS function in hive to search for multiple entries in an array column rather than just one? So rather than:

WHERE ARRAY_CONTAINS(array, val1) OR ARRAY_CONTAINS(array, val2)

I would like to write:

WHERE ARRAY_CONTAINS(array, val1, val2)

The full problem is that I need to read val1 and val2 dynamically from the command line arguments when I run the script and I generally don't know how many values will be conditioned on. So you can think of vals being a comma separated list (or array) containing values val1, val2, ..., and I want to write

WHERE ARRAY_CONTAINS(array, vals)

Thanks in advance!

回答1:

There is a UDF here that will let you take the intersection of two arrays. Assuming your values have the structure

values_array = [val1, val2, ..., valn]

You could then do

where array_intersection(array, values_array)[0] is not null

If they don't have any elements in common, [] will be returned and therefore [][0] will be null



标签: sql hive