I am using Datastax Cassandra 3.1.4 with two nodes. I am running pig with CqlStorage() with 12million rows in the table, but I find there is only one map running for a simple pig command.
I tried changing split_size in my pig relation but it didn't worked.
Here is my sample query.
x = load'cql://Mykeyspace/MyCF?split_size=1000' using CqlStorage();
y = limit x 500;
dump y
I didn't find input.split.size property in my mapred-site.xml I am assuming default split size is 64*1024
I tried set pig.splitCombination false;
Now its taking 513 maps for any no.of records, I tried same thing from Hive
I have connected to Cassandra from Hive and gave a simple select all query with where col1>value this table have only 10 records but still this is running 513 maps.
Please help me on this
Thanks
Try this setting:
By default, pig will combine what it considers small splits into a single map.