I've a dataset (CSV) that has three value columns (v1, 2 and 3) with a value. The description of the value is stored as a comma separated string in the column 'keys'.
| v1 | v2 | v3 | keys |
| A | C | E | X,Y,Z |
Using Pig I would like to load this information in a HBase table where the Column Family is C and the Column Qualifier is the key.
| C:X | C:Y | C:Z |
| A | C | E |
Has anyone done this before and would like to share this knowledge?
Another option is to store a map (key#value) in a HBase column. But I'm not sure if this is flexible for querying the data?
This is the common problem while processing multi-structure schema. If you really want to try this with MAP type then it is a bad idea.
You can try this with MapReduce. MapReduce is best solution for this.
Found a solution to my problem
test.pig:
data.py: