I have to loop over 30 variables in a list
[var1,var2, ... , var30]
and for each variable I use some PIG group by
statement such as
grouped = GROUP data by var1;
data_var1 = FOREACH grouped{
GENERATE group as mygroup,
COUNT(data) as count;
};
Is there a way to loop over the list of variables or I am forced to repeat the code above manually 30 times in my code?
Thanks!
I think what you're looking for is the pig macro
Create a relation for your 30 variables, and iterate on them by foreach, and call a macro which get 2 params: your data relation and the var you want to group by. Just check the example in the link the macro is really similar what you'd like to do.
UPDATE & code
So here's the macro you can use:
Use the macro:
I'm still thinking on how can you iterate through on your fields you'd like to group by. My original suggestion to foreach through a relation what contains the filed names not correct. (To create a UDF for this always works.) Let me think about it. But this macro works as is if you call by all the filed name you want to group.