Hadoop (java) change the type of Mapper output val

2019-07-16 03:52发布

I am writing a mapper function that generates the keys as some user_id and the values are also Text type. Here is how I do this

public static class UserMapper extends Mapper<Object, Text, Text, IntWritable> {
    private final static IntWritable one = new IntWritable(1);
    private Text userid = new Text();
    private Text catid = new Text();

    /* map method */
    public void map(Object key, Text value, Context context)
                throws IOException, InterruptedException {
        StringTokenizer itr = new StringTokenizer(value.toString(), ","); /* separated by "," */
        int count = 0;

        userid.set(itr.nextToken());

        while (itr.hasMoreTokens()) {
            if (++count == 3) {
                catid.set(itr.nextToken());
                context.write(userid, catid);
            }else {
                itr.nextToken();
            }
        }
    }
}

And then, in the main program, I set the output class of the mapper as follows:

    Job job = new Job(conf, "Customer Analyzer");
    job.setJarByClass(popularCategories.class);
    job.setMapperClass(UserMapper.class);
    job.setCombinerClass(UserReducer.class);
    job.setReducerClass(UserReducer.class);

    job.setMapOutputKeyClass(Text.class);
    job.setMapOutputValueClass(Text.class);

So even though I have set the class of the output values to be of Text.class, still I get the following error when compile it:

popularCategories.java:39: write(org.apache.hadoop.io.Text,org.apache.hadoop.io.IntWritable)
 in org.apache.hadoop.mapreduce.TaskInputOutputContext<java.lang.Object,
 org.apache.hadoop.io.Text,org.apache.hadoop.io.Text,
 org.apache.hadoop.io.IntWritable> 
 cannot be applied to (org.apache.hadoop.io.Text,org.apache.hadoop.io.Text)
 context.write(userid, catid);
                           ^

According to this error, it is still considering a mapper class of this format: write(org.apache.hadoop.io.Text,org.apache.hadoop.io.IntWritable)

So, when I change the class definition as follows, the problem is solved.

 public static class UserMapper extends Mapper<Object, Text, Text, Text> {

 }

So, I want to understand what is the difference between the class definition and setting the mapper output vaue class.

3条回答
我欲成王,谁敢阻挡
2楼-- · 2019-07-16 04:45

From Apache documentation page

Class Mapper<KEYIN,VALUEIN,KEYOUT,VALUEOUT>

java.lang.Object
org.apache.hadoop.mapreduce.Mapper<KEYIN,VALUEIN,KEYOUT,VALUEOUT>

Where

KEYIN = offset of the record  ( input for Mapper )
VALUEIN = value of the line in the record ( input for Mapper )
KEYOUT = Mapper output key ( Output of Mapper, input of Reducer)
VALUEOUT = Mapper output value ( Output of Mapper, input to Reducer)

Your problem has been solved after you have corrected the Mapper value in your definition from

public static class UserMapper extends Mapper<Object, Text, Text, IntWritable> {

to

public static class UserMapper extends Mapper<Object, Text, Text, Text> {

Have a look at related SE question:

Why LongWritable (key) has not been used in Mapper class?

I have found this article is also useful to understand the concepts clearly.

查看更多
趁早两清
3楼-- · 2019-07-16 04:50

In your mapper class definition, you are setting the outputValue class to IntWriteable.

public static class UserMapper extends Mapper<Object, Text, Text, IntWritable>

However, in the mapper class, your are instantiating catId as Text.

private Text catid = new Text();

Even though you have set the MapOutputValueClass as Text you will need to change the definition of your mapper class to be in sync with the key and value output classes set in the driver.

查看更多
Summer. ? 凉城
4楼-- · 2019-07-16 04:55

The class definition has both the input and output type. For instance your Mapper is taking in Object,Text and emitting Text,Text. In your driver class you have set the expected output of the Mapper Class to Text for both the key and value, therefore the hadoop framework is expecting your Mapper Class definition to have these output types and for your class to emit Text for both the key and value when you call context.write(Text,Text).

查看更多
登录 后发表回答