I'm using the MRv1 from CDH4 (4.5) and facing a problem with CompositeInputFormat
. It doesn't matter how many inputs I try to join. For the sake of simplicity, here's the example with just one input:
Configuration conf = new Configuration();
Job job = new Job(conf, "Blah");
job.setJarByClass(Blah.class);
job.setMapperClass(Blah.BlahMapper.class);
job.setReducerClass(Blah.BlahReducer.class);
job.setMapOutputKeyClass(LongWritable.class);
job.setMapOutputValueClass(BlahElement.class);
job.setOutputKeyClass(LongWritable.class);
job.setOutputValueClass(BlahElement.class);
job.setInputFormatClass(CompositeInputFormat.class);
String joinStatement = CompositeInputFormat.compose("inner", SequenceFileInputFormat.class, "/someinput");
System.out.println(joinStatement);
conf.set("mapred.join.expr", joinStatement);
job.setOutputFormatClass(SequenceFileOutputFormat.class);
FileOutputFormat.setOutputPath(job, new Path(newoutput));
return job.waitForCompletion(true) ? 0 : 1;
Here's the output + stacktrace:
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/hadoop2/share/hadoop/mapreduce1/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/hadoop2/share/hadoop/common/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
14/01/31 03:27:47 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
inner(tbl(org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat,"/someinput"))
14/01/31 03:27:48 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
14/01/31 03:27:51 INFO mapred.JobClient: Cleaning up the staging area hdfs://archangel-desktop:54310/tmp/hadoop/mapred/staging/hadoop/.staging/job_201401302213_0013
14/01/31 03:27:51 ERROR security.UserGroupInformation: PriviledgedActionException as:hadoop (auth:SIMPLE) cause:java.io.IOException: Expression is null
Exception in thread "main" java.io.IOException: Expression is null
at org.apache.hadoop.mapreduce.lib.join.Parser.parse(Parser.java:542)
at org.apache.hadoop.mapreduce.lib.join.CompositeInputFormat.setFormat(CompositeInputFormat.java:85)
at org.apache.hadoop.mapreduce.lib.join.CompositeInputFormat.getSplits(CompositeInputFormat.java:127)
at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:1079)
at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1096)
at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:177)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:995)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:948)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:948)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:566)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:596)
at com.nileshc.graphfu.pagerank.BlockMatVec.run(BlockMatVec.java:79)
at com.nileshc.graphfu.Main.main(Main.java:21)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
Anyone ever faced this before? Any ideas on how to solve it?
My bad.
The above should be:
And:
^^That should be:
The first change is what made all the difference.
In the above code,
the above line is coded after the Job obeject is created. So its obvious that Job object does not know of this configuration!!!!!!
See the revised code below:-
The following is the other way around:-
Use the above code instead of