合和减速机可以是不同的？(combiner and reducer can be different

在很多MapReduce程序，我看到正在使用的减压器中的组合为好。我知道这是因为这些程序的具体性质。但我想知道，如果他们可以是不同的。

Answer 1:

是的，一个组合可以是减速器不同，虽然你合仍然会实现减速接口。组合只能在将要与作业相关的特定情况下才能使用。该组合将经营就像一个减速，但只从每个映射器输出的键/值的子集。

你合将有一个约束，不像减速，是输入/输出键和值的类型必须与输出类型的映射相匹配。

Answer 2:

是的，他们当然可以是不同的，但我不认为你需要使用不同的类作为主要你会得到意想不到的结果。

组合器只能在其是可交换的功能（AB = BA）和缔合{一个。（BC）=（AB）.C}使用。这也意味着，组合可以在你的钥匙和值的子集，只经营或可能根本不会执行，还是你想要的程序的输出保持不变。

选择具有不同的逻辑不同的类可能不会给你一个逻辑输出。

Answer 3:

下面是实现，您可以在不合并与合并运行，既提供了完全相同的答案。这里减速，合路器有不同的动机和不同的实现。

package combiner;

import java.io.IOException;


import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class Map extends Mapper<LongWritable, Text, Text, Average> {

Text name = new Text();
String[] row;

protected void map(LongWritable offSet, Text line, Context context) throws IOException, InterruptedException {
    row = line.toString().split(" ");
    System.out.println("Key "+row[0]+"Value "+row[1]);
    name.set(row[0]);
    context.write(name, new Average(Integer.parseInt(row[1].toString()), 1));
}}

降低等级

public class Reduce extends Reducer<Text, Average, Text, LongWritable> {
    LongWritable avg =new LongWritable();
    protected void reduce(Text key, Iterable<Average> val, Context context)throws IOException, InterruptedException {
    int total=0; int count=0; long avgg=0;

    for (Average value : val){
        total+=value.number*value.count;
        count+=value.count;
        avgg=total/count;   
        }
    avg.set(avgg);
    context.write(key, avg);
}
}

MapObject的类

public class Average implements Writable {

long number;
int count;

public Average() {super();}

public Average(long number, int count) {
    this.number = number;
    this.count = count;
}

public long getNumber() {return number;}
public void setNumber(long number) {this.number = number;}
public int getCount() {return count;}
public void setCount(int count) {this.count = count;}

@Override
public void readFields(DataInput dataInput) throws IOException {
    number = WritableUtils.readVLong(dataInput);
    count = WritableUtils.readVInt(dataInput);      
}

@Override
public void write(DataOutput dataOutput) throws IOException {
    WritableUtils.writeVLong(dataOutput, number);
    WritableUtils.writeVInt(dataOutput, count);

}
}

组合类

public class Combine extends Reducer<Text, Average, Text, Average>{

protected void reduce(Text name, Iterable<Average> val, Context context)throws IOException, InterruptedException {
    int total=0; int count=0; long avg=0;

    for (Average value : val){
        total+=value.number;
        count+=1;
        avg=total/count;    
        }
    context.write(name, new Average(avg, count));

}
}

驱动程序类

public class Driver1 {

public static void main(String[] args) throws Exception { 

    Configuration conf = new Configuration();
    if (args.length != 2) {
        System.err.println("Usage: SecondarySort <in> <out>");
        System.exit(2);
    }
    Job job = new Job(conf, "CustomCobiner");
    job.setJarByClass(Driver1.class);
    job.setMapperClass(Map.class);
    job.setCombinerClass(Combine.class);
    job.setMapOutputKeyClass(Text.class);
    job.setMapOutputValueClass(Average.class);
    job.setReducerClass(Reduce.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);     
    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));
    System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}

从Git的代码在这里

离开乌拉圭回合的建议..