i'm trying to find all combinations possible using apache pig, i was able to generate permutation but i want to eliminate the replication of values i write this code :
A = LOAD 'data' AS f1:chararray;
DUMP A;
('A')
('B')
('C')
B = FOREACH A GENERATE $0 AS v1;
C = FOREACH A GENERATE $0 AS v2;
D = CROSS B, C;
And the result i obtained is like :
DUMP D;
('A', 'A')
('A', 'B')
('A', 'C')
('B', 'A')
('B', 'B')
('B', 'C')
('C', 'A')
('C', 'B')
('C', 'C')
but what i'm trying to obtain the result is like bellow
DUMP R;
('A', 'A')
('A', 'B')
('A', 'C')
('B', 'B')
('B', 'C')
('C', 'C')
how can i do this? i avoid to use comparison of characters because it's possible to have multiple occurrences of a string in more than a line