A basic question before i write a udf to be used in hive. I want to join two tables based on custom UDF which takes an argument from table a and another from table b. I have seen examples of UDFs which take arguments from one of the tables to be joined. Does taking arguments from two tables work equally well?.
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
回答1:
It sounds like you want a function
function my_udf(val_A, val_B):
trans_A = <do something to val_A>
trans_B = <do something to val_B>
return trans_A cmp trans_B
The UDF will return a boolean, which you can use in an ON clause.
I'm not sure you can do this directly in Hive, but you can always use two UDFs to transform val_A to trans_A and val_B to trans_B then use a normal ON:
select *
from
(select *, udf_A(some_column) as trans_A from A) as AA
JOIN
(select *, udf_B(some_column) as trans_B from B) as BB on AA.trans_A = BB.trans_B