Declare a comma seperated string constant

2019-08-07 17:51发布

问题:

Objective : Declare a comma seperated string constant

    test.csv
    =========
    a
    b
    c
    d
    e
    f

Pig Script :

  %declare ACTIVE_VALUES 'a', 'b','c' ; 

  -- Declaring constant like this using "" (double quotes) or even using escape characters (\) is resulting in a WARN message as below 
  -- WARN  org.apache.pig.tools.parameters.PreprocessorContext - Warning : Multiple values found for ACTIVE_VALUES

  A = LOAD 'test.csv' using PigStorage(',') AS (value:chararray);
  B = FILTER A BY value in ($ACTIVE_VALUES);
  dump B;

Expected Output :

 a
 b
 c

Any inputs on declaring a comma separated string constant in Pig.

-- Declaring constant like this using "" (double quotes) or even using escape characters (\) is resulting in a WARN message as below

-- WARN org.apache.pig.tools.parameters.PreprocessorContext - Warning : Multiple values found for ACTIVE_VALUES

回答1:

You can use a single comma delimited string ('a,b,c') and use STRSPLIT (https://pig.apache.org/docs/r0.9.1/func.html#strsplit) function on ACTIVE_VALUES to get bag of characters, which can be FLATTEN'd to create multiple records. This data can be INNER JOIN'ed with data from test file to get the desired results.