Using like operator against STRUCT data type

2019-08-02 06:18发布

问题:

I have a table with array of structs. Is there a way to filter the records from this column using like operator ?

hive> desc location;
location_list           array<struct<city:string,state:string>>

hive> select * from location;
row1 : [{"city":"Hudson","state":"NY"},{"city":"San Jose","state":"CA"},{"city":"Albany","state":"NY"}]
row2 : [{"city":"San Jose","state":"CA"},{"city":"San Diego","state":"CA"}]

I am trying to run a query something like this, to filter only those records with "NY" state.

hive> select * from location where location_list like '%"NY"%';
FAILED: SemanticException [Error 10014]: Line 1:29 Wrong arguments ''%"NY"%'': No matching method for class org.apache.hadoop.hive.ql.udf.UDFLike with (array<struct<city:string,state:string>>, string). Possible choices: _FUNC_(string, string)

Note : I could do this by doing a lateralview & explode of this struct column. But trying to avoid it because I need join this table with another where lateral view is not accepted.

回答1:

Nice question, you can do it in the following efficient ( and beautiful) way.

select * from location 
where array_contains(location_list.state, 'NY');

In this case, location_list.state will create an array of strings (states in your case) so you can use the UDF array_contains for value checking. This will look for exact value, you will not be able to perform a matching like the like operator but you should be able to achieve what you are looking



回答2:

Demo of array_contains:

select my_array  
from
( --emulation of your dataset. Just replace this subquery with your table
 select array(named_struct("city","Hudson","state","NY"),named_struct("city","San Jose","state","CA"),named_struct("city","Albany","state","NY")) as my_array
 union all
 select array(named_struct("city","San Jose","state","CA"),named_struct("city","San Diego","state","CA")) as my_array
)s
where array_contains(my_array.state,'NY') 
;

Result:

OK
[{"city":"Hudson","state":"NY"},{"city":"San Jose","state":"CA"},{"city":"Albany","state":"NY"}]
Time taken: 34.055 seconds, Fetched: 1 row(s)


标签: hive