Multiple mappers in hadoop

I am trying to run 2 independent mappers on the same input file in a hadoop program using one job. I want the output of both the mappers to go into a single reducer. I face an issue with running multiple mappers. I was using MultipleInputs class. It was working fine by running both the mappers but yesterday i noticed that it is running only one map function that is the second MultipleInputs statement seems to overwrite the first one. I dont find any change done to the code to show this different behavior suddenly :( Please help me in this.. The main function is :

    public static void main(String[] args) throws Exception {
            Configuration conf = new Configuration();
            Job job = new Job(conf, "mapper accepting whole file at once");          
            job.setOutputKeyClass(IntWritable.class);
            job.setOutputValueClass(IntWritable.class);
            job.setJarByClass(TestMultipleInputs.class);
            job.setMapperClass(Map2.class);    
            job.setMapperClass(Map1.class);  
            job.setReducerClass(Reduce.class);  
            job.setInputFormatClass(NLinesInputFormat.class);
            job.setOutputFormatClass(TextOutputFormat.class);
            job.setMapOutputKeyClass(IntWritable.class);
**  MultipleInputs.addInputPath(job, new Path("Rec"), NLinesInputFormat.class, Map1.class);
    MultipleInputs.addInputPath(job, new Path("Rec"), NLinesInputFormat.class, Map2.class);**    
            FileOutputFormat.setOutputPath(job,new Path("testMulinput"));
              job.waitForCompletion(true);
}

Whichever Map class is used in the last MultipleInputs statement gets executed. Like in here the Map2.class gets executed.

标签： hadoop

2条回答

对你真心纯属浪费

2楼-- · 2020-04-08 15:06

Both mappers can't read the same file at the same time.

Solution (Workaround): Create a duplicate of the input file (in this case let duplicate rec file be rec1). Then feed mapper1 with rec and mapper2 with rec1.

Both mappers are executed parallel so you don't need to worry about reducer output because both mappers output will be shuffled so that equal keys from both the files go to same reducer.

so output is what you want.

Hope this helps to others who are facing similar issue.

0人赞添加讨论(0) 举报

倾城　Initia

3楼-- · 2020-04-08 15:10

You wont be able to read from the same file at the same time with two separate Mappers (at least not without some devilishly hack-ish trickery which you should probably avoid).

In any case, you cant have two Mapper classes set for the same job - the latter call to setMapperClass(class) will always overwrite the former. If you need two Mappers to run simultaneously, you'll need to make two separate jobs, and ensure that there are enough mappers available on your cluster to run them both simultaneously (if there aren't any available after the first job starts the second job will have to wait for it to finish, running sequentially rather than simultaneously.)

However, due to the lack of a guarantee that the Mappers will run concurrently, ensure that the functionality of your MapReduce jobs is not reliant on their concurrent execution.

0人赞添加讨论(0) 举报

Multiple mappers in hadoop

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间