I am writing a mapper function that generates the keys as some user_id and the values are also Text type. Here is how I do this
public static class UserMapper extends Mapper<Object, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text userid = new Text();
private Text catid = new Text();
/* map method */
public void map(Object key, Text value, Context context)
throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString(), ","); /* separated by "," */
int count = 0;
userid.set(itr.nextToken());
while (itr.hasMoreTokens()) {
if (++count == 3) {
catid.set(itr.nextToken());
context.write(userid, catid);
}else {
itr.nextToken();
}
}
}
}
And then, in the main program, I set the output class of the mapper as follows:
Job job = new Job(conf, "Customer Analyzer");
job.setJarByClass(popularCategories.class);
job.setMapperClass(UserMapper.class);
job.setCombinerClass(UserReducer.class);
job.setReducerClass(UserReducer.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);
So even though I have set the class of the output values to be of Text.class, still I get the following error when compile it:
popularCategories.java:39: write(org.apache.hadoop.io.Text,org.apache.hadoop.io.IntWritable)
in org.apache.hadoop.mapreduce.TaskInputOutputContext<java.lang.Object,
org.apache.hadoop.io.Text,org.apache.hadoop.io.Text,
org.apache.hadoop.io.IntWritable>
cannot be applied to (org.apache.hadoop.io.Text,org.apache.hadoop.io.Text)
context.write(userid, catid);
^
According to this error, it is still considering a mapper class of this format: write(org.apache.hadoop.io.Text,org.apache.hadoop.io.IntWritable)
So, when I change the class definition as follows, the problem is solved.
public static class UserMapper extends Mapper<Object, Text, Text, Text> {
}
So, I want to understand what is the difference between the class definition and setting the mapper output vaue class.
From Apache documentation page
Where
Your problem has been solved after you have corrected the Mapper value in your definition from
to
Have a look at related SE question:
Why LongWritable (key) has not been used in Mapper class?
I have found this article is also useful to understand the concepts clearly.
In your mapper class definition, you are setting the outputValue class to IntWriteable.
However, in the mapper class, your are instantiating catId as Text.
Even though you have set the MapOutputValueClass as Text you will need to change the definition of your mapper class to be in sync with the key and value output classes set in the driver.
The class definition has both the input and output type. For instance your Mapper is taking in
Object,Text
and emittingText,Text
. In your driver class you have set the expected output of the Mapper Class toText
for both the key and value, therefore the hadoop framework is expecting your Mapper Class definition to have these output types and for your class to emitText
for both the key and value when you callcontext.write(Text,Text)
.