如何使用扁平化蜂巢/猪/ MapReduce的递归层次结构(How to flatten recur

2019-09-26 17:14发布

我有存储在表格格式像不平衡树数据:

parent,child
a,b
b,c
c,d
c,f
f,g

树的深度不明。

如何弄平该层次结构,其中每一行包含从叶节点到根节点的行作为整个路径:

leaf node, root node, intermediate nodes
d,a,d:c:b
f,a,e:b

任何建议采用蜂巢,猪或MapReduce的解决上述问题呢? 提前致谢。

Answer 1:

我尝试使用它的猪,这里是示例代码来解决:

加入的功能:

-- Join parent and child
Define join_hierarchy ( leftA, source, result) returns output {
    joined= join $leftA by parent left, $source by child;
    tmp_filtered= filter joined by source::parent is null;
    part= foreach tmp_filtered leftA::child as child, leftA::path as path;
    $result= union part, $result;
    part_remaining= filter joined by source::parent is not null;
    $output= foreach part_remaining generate $leftA::child as child, source::parent as parent, concat(concat(source::parent,':'),$leftA::path)
 }

加载数据集:

--My dataset field delimiter is ','.    
source= load '*****' using pigStorage(',') as (parent:chararray, child:chararray);
--create additional column for path
leftA= foreach source generate child, parent, concat(parent,':');  

--initially result table will be blank.
result= limit leftA 1;
result= foreach result generate '' as child , '' as parent;
--Flatten hierarchy to 4 levels. Add below lines equivalent to hierarchy depth.

leftA= join_hierarchy(leftA, source, result);
leftA= join_hierarchy(leftA, source, result);
leftA= join_hierarchy(leftA, source, result);
leftA= join_hierarchy(leftA, source, result);


文章来源: How to flatten recursive hierarchy using Hive/Pig/MapReduce