Pig: is it possible to write a loop over variables

2019-01-27 08:01发布

I have to loop over 30 variables in a list

[var1,var2, ... , var30]

and for each variable I use some PIG group by statement such as

grouped = GROUP data by var1;
data_var1 = FOREACH grouped{
                            GENERATE group as mygroup,
                                     COUNT(data) as count;
                            };

Is there a way to loop over the list of variables or I am forced to repeat the code above manually 30 times in my code?

Thanks!

1条回答
走好不送
2楼-- · 2019-01-27 08:58

I think what you're looking for is the pig macro

Create a relation for your 30 variables, and iterate on them by foreach, and call a macro which get 2 params: your data relation and the var you want to group by. Just check the example in the link the macro is really similar what you'd like to do.

UPDATE & code

So here's the macro you can use:

DEFINE my_cnt(data, group_field) RETURNS C {
        $C = FOREACH (GROUP $data by $group_field) GENERATE
                group AS mygroup,
                COUNT($data) AS count;
};

Use the macro:

IMPORT 'cnt.macro';

data = LOAD 'data.txt' USING PigStorage(',') AS (field:chararray, value:chararray);
DESCRIBE data;

e = my_cnt(data,'the_field_you_group_by');
DESCRIBE e;
DUMP e;

I'm still thinking on how can you iterate through on your fields you'd like to group by. My original suggestion to foreach through a relation what contains the filed names not correct. (To create a UDF for this always works.) Let me think about it. But this macro works as is if you call by all the filed name you want to group.

查看更多
登录 后发表回答