Looking up variable keys in pig map

2019-07-30 19:14发布

I'm trying to use pig to break text into lowercased words, and then look up each word in a map. Here's my example map, which I have in map.txt (it is only 1 line long):

[this#1.9,is#2.5my#3.3,vocabulary#4.1]

I load this like so:

M = LOAD 'mapping.txt' USING PigStorage AS (mp: map[float]);

which works just fine. Then I do the following to load the text and break it into lowercased words:

LINES = LOAD 'test.txt' USING TextLoader() AS (line:chararray);
TOKENS = FOREACH LINES GENERATE FLATTEN(TOKENIZE(LOWER(line))) as (word:chararray);

Now, I'd like to do something like this:

RESULTS = FOREACH TOKENS GENERATE M.mp#word;

so that if I have a line like "this my my vocabulary", I'd get the following output: 1 3 3 4 , but I keep getting various errors. How can I look up variable values in a map?

I've looked at How can I use the map datatype in Apache Pig? and http://pig.apache.org/docs/r0.10.0/basic.html#map-schema , but these only help if I'm looking up a fixed value in a map, for example M.mp#'this', which is not what I want to do here.

标签： map apache-pig

1条回答

地球回转人心会变

2楼-- · 2019-07-30 19:45

You can also FLATTEN M and then JOIN M and LINES based on Token/word (you can do a 'replicated' join on M so it would be copies to each mapper

0人赞添加讨论(0) 举报

Looking up variable keys in pig map

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间