I thought that they refer to the Reducer but in my program I have
public static class MyMapper extends
Mapper< LongWritable, Text, Text, Text >
and
public static class MyReducer extends
Reducer< Text, Text, NullWritable, Text >
so if I have
job.setOutputKeyClass( NullWritable.class );
job.setOutputValueClass( Text.class );
I get the following Exception
Type mismatch in key from map: expected org.apache.hadoop.io.NullWritable, recieved org.apache.hadoop.io.Text
but if I have
job.setOutputKeyClass( Text.class );
there is no problem.
Is there sth wrong with my code or this happens because of NullWritable or sth else?
Also do I have to use job.setInputFormatClass
and job.setOutputFormatClass
? Because my programs runs correctly without them.
Calling
job.setOutputKeyClass( NullWritable.class );
will set the types expected as output from both the map and reduce phases.If your Mapper emits different types than the Reducer, you can set the types emitted by the mapper with the
JobConf
'ssetMapOutputKeyClass()
andsetMapOutputValueClass()
methods. These implicitly set the input types expected by the Reducer.(source: Yahoo Developer Tutorial)
Regarding your second question, the default
InputFormat
is theTextInputFormat
. This treats each line of each input file as a separate record, and performs no parsing. You can call these methods if you need to process your input in a different format, here are some examples:The default instance of
OutputFormat
isTextOutputFormat
, which writes (key, value) pairs on individual lines of a text file. Some examples below:(source: Other Yahoo Developer Tutorial)