How to compute the intersections and unions of two

2019-07-26 12:55发布

问题:

For example, the intersection

select intersect(array("A","B"), array("B","C"))

should return

["B"]

and the union

 select union(array("A","B"), array("B","C"))

should return

["A","B","C"]

What's the best way to make this in Hive? I have checked the hive documentation, but cannot find any relevant information to do this.

回答1:

Your problem solution is here. Go to the githubLink, there is lot of udfs are created by klout. Download, crate the JAR and add the JAR in the hive. Example

 CREATE TEMPORARY FUNCTION combine AS 'brickhouse.udf.collect.CombineUDF';
 CREATE TEMPORARY FUNCTION combine_unique AS 'brickhouse.udf.collect.CombineUniqueUDAF';

select combine_unique(combine(array('a','b','c'), array('b','c','d'))) from reqtable;

OK
["d","b","c","a"]