Datastax Cassandra PIG Running only one MAP

2019-08-07 01:20发布

I am using Datastax Cassandra 3.1.4 with two nodes. I am running pig with CqlStorage() with 12million rows in the table, but I find there is only one map running for a simple pig command.

I tried changing split_size in my pig relation but it didn't worked.

Here is my sample query.

x = load'cql://Mykeyspace/MyCF?split_size=1000' using CqlStorage();
y = limit x 500;
dump y

I didn't find input.split.size property in my mapred-site.xml I am assuming default split size is 64*1024

I tried set pig.splitCombination false;

Now its taking 513 maps for any no.of records, I tried same thing from Hive

I have connected to Cassandra from Hive and gave a simple select all query with where col1>value this table have only 10 records but still this is running 513 maps.

Please help me on this

Thanks

1条回答
我只想做你的唯一
2楼-- · 2019-08-07 01:38

Try this setting:

set pig.splitCombination false;

By default, pig will combine what it considers small splits into a single map.

查看更多
登录 后发表回答