Using IN clause with PIG FILTER

2020-06-21 04:22发布

Does PIG support IN clause?

filtered = FILTER bba BY reason not in ('a','b','c','d');

or should i split it up into multiple OR's?

Thanks!

标签: apache-pig
6条回答
女痞
2楼-- · 2020-06-21 04:39
趁早两清
3楼-- · 2020-06-21 04:53

No, Pig doesn't support IN Clause. I had a similar situation. Though you can use AND operator and filter keyword as a work around. like

A= LOAD 'source.txt' AS (user:chararray, age:chararray);

B= FILTER A BY ($1 matches 'tapan') AND ($1 matches 'superman');

However, if the number of filtering required is huge. Then, probably, you can just create a relation that contains all these keywords and do a join to filter wherever the occurrence matches. Hope this helps.

查看更多
甜甜的少女心
4楼-- · 2020-06-21 04:55

Pig 0.12 added In operator http://www.edureka.co/blog/operators-in-apache-pig-diagnostic-operators/ see bottom of page..release notes. Haven't located it in official docs (apart from bare mention in release notes)

查看更多
Ridiculous、
5楼-- · 2020-06-21 04:55

you can do this likes:

X = FILTER bba BY NOT reason IN ('a','b','c','d');

more info

查看更多
放荡不羁爱自由
6楼-- · 2020-06-21 04:56

We can use IN clause as follows:

A = FILTER alias_name BY col_name IN (val1, val2,...,valn);

DUMP A;
查看更多
不美不萌又怎样
7楼-- · 2020-06-21 04:57

You can use below udf from Apache DataFu instead. This will help you to avoid writing lot of OR.

https://github.com/linkedin/datafu/blob/master/src/java/datafu/pig/util/InUDF.java

查看更多
登录 后发表回答