How to compute the intersections and unions of two

2019-07-26 12:59发布

For example, the intersection

select intersect(array("A","B"), array("B","C"))

should return

["B"]

and the union

 select union(array("A","B"), array("B","C"))

should return

["A","B","C"]

What's the best way to make this in Hive? I have checked the hive documentation, but cannot find any relevant information to do this.

1条回答
萌系小妹纸
2楼-- · 2019-07-26 13:47

Your problem solution is here. Go to the githubLink, there is lot of udfs are created by klout. Download, crate the JAR and add the JAR in the hive. Example

 CREATE TEMPORARY FUNCTION combine AS 'brickhouse.udf.collect.CombineUDF';
 CREATE TEMPORARY FUNCTION combine_unique AS 'brickhouse.udf.collect.CombineUniqueUDAF';

select combine_unique(combine(array('a','b','c'), array('b','c','d'))) from reqtable;

OK
["d","b","c","a"]
查看更多
登录 后发表回答