I am new to pigscript. Say, We have a file
pig script
A = LOAD 'txt' AS (in: map[]);
We know that we can take the values feeding in the key. In the above example I took the map that contains the values with respect to the key "a".
Assuming that I dont know the key, I want to group the values with respect to keys in a relation and dump it.
Does pig allows such operations or need to go with UDF? Please help me through this. Thanks.
You can create a custom UDF
which converts the map to a bag (using Pig v0.10.0):
package com.example;
import java.io.IOException;
import java.util.Map;
import java.util.Map.Entry;
import org.apache.pig.EvalFunc;
import org.apache.pig.data.BagFactory;
import org.apache.pig.data.DataBag;
import org.apache.pig.data.Tuple;
import org.apache.pig.data.TupleFactory;
public class MapToBag extends EvalFunc<DataBag> {
private static final BagFactory bagFactory = BagFactory.getInstance();
private static final TupleFactory tupleFactory = TupleFactory.getInstance();
public DataBag exec(Tuple input) throws IOException {
try {
Map<String, Object> map = (Map<String, Object>) input.get(0);
DataBag result = null;
if (map != null) {
result = bagFactory.newDefaultBag();
for (Entry<String, Object> entry : map.entrySet()) {
Tuple tuple = tupleFactory.newTuple(2);
tuple.set(0, entry.getKey());
tuple.set(1, entry.getValue());
return result;
catch (Exception e) {
throw new RuntimeException("MapToBag error", e);
B = foreach A generate
flatten(com.example.MapToBag(in)) as (k:chararray, v:chararray);
describe B;
B: {k: chararray,v: chararray}
Now group by key and use a nested foreach:
C = foreach (group B by k) {
value = foreach B generate v;
generate group as key, value;
dump C;