I'm using:
- hadoop-client 2.2.0
- mrunit 1.0.0
- avro 1.7.6
- avro-mrunit 1.7.6
... and the entire thing is being built and tested using Maven.
I was getting a NullPointerException until I followed the instructions at MRUnit with Avro NullPointerException in Serialization.
Now I am getting an InstantiationException:
Running mypackage.MyTest
log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
2014-03-23 20:49:21.463 java[27994:1003] Unable to load realm info from SCDynamicStore
Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.945 sec <<< FAILURE!
process(mypackage.MyTest) Time elapsed: 0.909 sec <<< ERROR!
java.lang.RuntimeException: java.lang.InstantiationException
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131)
at org.apache.hadoop.io.serializer.SerializationFactory.add(SerializationFactory.java:72)
at org.apache.hadoop.io.serializer.SerializationFactory.<init>(SerializationFactory.java:63)
at org.apache.hadoop.mrunit.internal.io.Serialization.<init>(Serialization.java:37)
at org.apache.hadoop.mrunit.TestDriver.getSerialization(TestDriver.java:464)
at org.apache.hadoop.mrunit.TestDriver.copy(TestDriver.java:608)
at org.apache.hadoop.mrunit.TestDriver.copyPair(TestDriver.java:612)
at org.apache.hadoop.mrunit.MapDriverBase.addInput(MapDriverBase.java:118)
at org.apache.hadoop.mrunit.MapDriverBase.withInput(MapDriverBase.java:207)
at mypackage.MyTest.process(MyTest.java:92)
...
The Avro model looks like this:
{
"namespace": "model",
"type": "record",
"name": "Blob",
"fields": [
{ "name": "value", "type": "string" }
]
}
The mapper looks like this:
public class MyMapper
extends Mapper<AvroKey<Blob>, NullWritable, LongWritable, NullWritable>
{
@Override
public void map(AvroKey<Blob> key, NullWritable value, Context context)
throws IOException, InterruptedException {
context.write(new LongWritable(0), NullWritable.get());
}
}
The test that is failing (the only test I have at the moment) looks like this:
@Test
public void process() throws IOException {
mapper = new MyMapper();
job = Job.getInstance();
mapDriver = MapDriver.newMapDriver(mapper);
Configuration configuration = mapDriver.getConfiguration();
//Copy over the default io.serializations. If you don't do this then you will
//not be able to deserialize the inputs to the mapper
String[] serializations = configuration.getStrings("io.serializations");
serializations = Arrays.copyOf(serializations, serializations.length + 1);
serializations[serializations.length-1] = AvroSerialization.class.getName();
configuration.setStrings("io.serializations", serializations);
//Configure AvroSerialization by specifying the key writer and value writer schemas
configuration.setStrings("avro.serialization.key.writer.schema", Schema.create(Schema.Type.LONG).toString(true));
configuration.setStrings("avro.serialization.value.writer.schema", Schema.create(Schema.Type.NULL).toString(true));
job.setMapperClass(MyMapper.class);
job.setInputFormatClass(AvroKeyInputFormat.class);
AvroJob.setInputKeySchema(job, Blob.SCHEMA$);
job.setOutputKeyClass(LongWritable.class);
input = Blob.newBuilder()
.setValue("abc")
.build();
mapDriver
.withInput(new AvroKey<Blob>(input), NullWritable.get())
.withOutput(new LongWritable(0), NullWritable.get())
.runTest();
}
I'm pretty new with both Avro and MRUnit, so I am still trying to fully understanding the workings between them. In the unit test output I see warnings about log4j and don't know for certain that this isn't part of the problem (thought I doubt it).