How to pass variable between two map reduce jobs

2019-07-04 01:35发布

I have chained two Map reduce jobs. The Job1 will have only one reducer and I am computing a float value. I want to use this value in my reducer of Job2. This is my main method setup.

public static String GlobalVriable;
public static void main(String[] args) throws Exception {

        int runs = 0;
        for (; runs < 10; runs++) {
            String inputPath = "part-r-000" + nf.format(runs);
            String outputPath = "part-r-000" + nf.format(runs + 1);
            MyProgram.MR1(inputPath);
            MyProgram.MR2(inputPath, outputPath);
        }
    }

    public static void MR1(String inputPath)
            throws IOException, InterruptedException, ClassNotFoundException {

        Configuration conf = new Configuration();
        conf.set("var1","");
        Job job = new Job(conf, "This is job1");
        job.setJarByClass(MyProgram.class);
        job.setMapperClass(MyMapper1.class);
        job.setReducerClass(MyReduce1.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(FloatWritable.class);
        FileInputFormat.addInputPath(job, new Path(inputPath));
        job.waitForCompletion(true);
        GlobalVriable = conf.get("var1"); // I am getting NULL here
    }

    public static void MR2(String inputPath, String outputPath)
            throws IOException, InterruptedException, ClassNotFoundException {

        Configuration conf = new Configuration();
        Job job = new Job(conf, "This is job2");
        ...
    }

    public static class MyReduce1 extends
        Reducer<Text, FloatWritable, Text, FloatWritable> {

    public void reduce(Text key, Iterable<FloatWritable> values, Context context)
            throws IOException, InterruptedException {

        float s = 0;
        for (FloatWritable val : values) {
            s += val.get();
        }

        String sum = Float.toString(s);
        context.getConfiguration().set("var1", sum);
    }
}

As you can see I need to iterate the entire program multiple times. My Job1 is computing a single number from the input. Since it is just a single number and a lot of iterations I dont want to write it to HDFS and read from it. Is there a way to share the value computed in Myreducer1 and use it in Myreducer2.

UPDATE: I have tried passing the value using conf.set & conf.get. The value is not being passed.

3条回答
成全新的幸福
2楼-- · 2019-07-04 02:04

Can't you just change the return type of MR1 to int (or whatever data type is appropriate) and return the number you computed:

    int myNumber = MyProgram.MR1(inputPath);

Then add a parameter to MR2 and call it with your computed number:

    MyProgram.MR2(inputPath, outputPath, myNumber);
查看更多
做自己的国王
3楼-- · 2019-07-04 02:12

You could use ZooKeeper for this. It's great for any inter-job coordination or message passing like this.

查看更多
Emotional °昔
4楼-- · 2019-07-04 02:13

Here's how to pass back a float value via a counter ...

First, in the first reducer, transform the float value into a long by multiplying by 1000 (to maintain 3 digits of precision, for example) and putting the result into a counter:

public void cleanup(Context context) {

    long result = (long) (floatValue * 1000);
    context.getCounter("Result","Result").increment(result); 

}

In the driver class, retrieve the long value and transform it back to a float:

public static void MR1(String inputPath)
        throws IOException, InterruptedException, ClassNotFoundException {

    Configuration conf = new Configuration();
    Job job = new Job(conf, "This is job1");
    job.setJarByClass(MyProgram.class);
    job.setMapperClass(MyMapper1.class);
    job.setReducerClass(MyReduce1.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(FloatWritable.class);
    FileInputFormat.addInputPath(job, new Path(inputPath));
    job.waitForCompletion(true);

    long result = job.getCounters().findCounter("Result","Result").getValue();
    float value = ((float)result) / 1000;

}
查看更多
登录 后发表回答