Looking up variable keys in pig map

2019-07-30 19:27发布

问题:

I'm trying to use pig to break text into lowercased words, and then look up each word in a map. Here's my example map, which I have in map.txt (it is only 1 line long):

[this#1.9,is#2.5my#3.3,vocabulary#4.1]

I load this like so:

M = LOAD 'mapping.txt' USING PigStorage AS (mp: map[float]);

which works just fine. Then I do the following to load the text and break it into lowercased words:

LINES = LOAD 'test.txt' USING TextLoader() AS (line:chararray);
TOKENS = FOREACH LINES GENERATE FLATTEN(TOKENIZE(LOWER(line))) as (word:chararray);

Now, I'd like to do something like this:

RESULTS = FOREACH TOKENS GENERATE M.mp#word;

so that if I have a line like "this my my vocabulary", I'd get the following output: 1 3 3 4 , but I keep getting various errors. How can I look up variable values in a map?

I've looked at How can I use the map datatype in Apache Pig? and http://pig.apache.org/docs/r0.10.0/basic.html#map-schema , but these only help if I'm looking up a fixed value in a map, for example M.mp#'this', which is not what I want to do here.

回答1:

You can also FLATTEN M and then JOIN M and LINES based on Token/word (you can do a 'replicated' join on M so it would be copies to each mapper