I need to access counters from mapper in reducer. I tried to perform this solution. My WordCount code is available below.
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Cluster;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import java.io.IOException;
import java.util.StringTokenizer;
public class WordCount {
static enum TestCounters { TEST }
public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
context.write(word, one);
context.getCounter(TestCounters.TEST).increment(1);
}
}
}
public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {
private long mapperCounter;
public void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
context.write(key, new IntWritable(sum));
}
@Override
protected void setup(Context context) throws IOException, InterruptedException {
Configuration conf = context.getConfiguration();
Cluster cluster = new Cluster(conf);
Job currentJob = cluster.getJob(context.getJobID());
mapperCounter = currentJob.getCounters().findCounter(TestCounters.TEST).getValue();;
System.out.println(mapperCounter);
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = new Job(conf, "WordCount");
job.setJarByClass(WordCount.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.waitForCompletion(true);
}
}
I run this code on IntelliJ with following dependency.
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-core</artifactId>
<version>1.2.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.7.1</version>
</dependency>
But I encountered NoSuchFieldError: SEPERATOR and didn't solve it. The error occur when running Cluster cluster = new Cluster(conf); line.
15/10/01 19:55:29 WARN mapred.LocalJobRunner: job_local482979212_0001
java.lang.NoSuchFieldError: SEPERATOR
at org.apache.hadoop.mapreduce.util.ConfigUtil.addDeprecatedKeys(ConfigUtil.java:54)
at org.apache.hadoop.mapreduce.util.ConfigUtil.loadResources(ConfigUtil.java:42)
at org.apache.hadoop.mapreduce.Cluster.<clinit>(Cluster.java:71)
at WordCount$Reduce.setup(WordCount.java:51)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:174)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)
15/10/01 19:55:30 INFO mapred.JobClient: map 100% reduce 0%
15/10/01 19:55:30 INFO mapred.JobClient: Job complete: job_local482979212_0001
15/10/01 19:55:30 INFO mapred.JobClient: Counters: 20
15/10/01 19:55:30 INFO mapred.JobClient: Map-Reduce Framework
15/10/01 19:55:30 INFO mapred.JobClient: Spilled Records=16
15/10/01 19:55:30 INFO mapred.JobClient: Map output materialized bytes=410
15/10/01 19:55:30 INFO mapred.JobClient: Reduce input records=0
15/10/01 19:55:30 INFO mapred.JobClient: Virtual memory (bytes) snapshot=0
15/10/01 19:55:30 INFO mapred.JobClient: Map input records=8
15/10/01 19:55:30 INFO mapred.JobClient: SPLIT_RAW_BYTES=103
15/10/01 19:55:30 INFO mapred.JobClient: Map output bytes=372
15/10/01 19:55:30 INFO mapred.JobClient: Reduce shuffle bytes=0
15/10/01 19:55:30 INFO mapred.JobClient: Physical memory (bytes) snapshot=0
15/10/01 19:55:30 INFO mapred.JobClient: Reduce input groups=0
15/10/01 19:55:30 INFO mapred.JobClient: Combine output records=0
15/10/01 19:55:30 INFO mapred.JobClient: Reduce output records=0
15/10/01 19:55:30 INFO mapred.JobClient: Map output records=16
15/10/01 19:55:30 INFO mapred.JobClient: Combine input records=0
15/10/01 19:55:30 INFO mapred.JobClient: CPU time spent (ms)=0
15/10/01 19:55:30 INFO mapred.JobClient: Total committed heap usage (bytes)=160432128
15/10/01 19:55:30 INFO mapred.JobClient: WordCount$TestCounters
15/10/01 19:55:30 INFO mapred.JobClient: TEST=16
15/10/01 19:55:30 INFO mapred.JobClient: File Input Format Counters
15/10/01 19:55:30 INFO mapred.JobClient: Bytes Read=313
15/10/01 19:55:30 INFO mapred.JobClient: FileSystemCounters
15/10/01 19:55:30 INFO mapred.JobClient: FILE_BYTES_WRITTEN=51594
15/10/01 19:55:30 INFO mapred.JobClient: FILE_BYTES_READ=472
After that, I built jar file and run on single node 2.6.0 hadoop. In here, I received following error.
15/10/01 20:58:08 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/10/01 20:58:13 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
15/10/01 20:58:17 WARN mapreduce.JobSubmitter: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
15/10/01 20:58:31 INFO input.FileInputFormat: Total input paths to process : 1
15/10/01 20:58:33 INFO mapreduce.JobSubmitter: number of splits:1
15/10/01 20:58:34 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1443718874432_0002
15/10/01 20:58:36 INFO impl.YarnClientImpl: Submitted application application_1443718874432_0002
15/10/01 20:58:36 INFO mapreduce.Job: The url to track the job: http://tolga-Aspire-5741G:8088/proxy/application_1443718874432_0002/
15/10/01 20:58:36 INFO mapreduce.Job: Running job: job_1443718874432_0002
15/10/01 20:59:22 INFO mapreduce.Job: Job job_1443718874432_0002 running in uber mode : false
15/10/01 20:59:22 INFO mapreduce.Job: map 0% reduce 0%
15/10/01 21:00:17 INFO mapreduce.Job: map 100% reduce 0%
15/10/01 21:00:20 INFO mapreduce.Job: map 0% reduce 0%
15/10/01 21:00:20 INFO mapreduce.Job: Task Id : attempt_1443718874432_0002_m_000000_0, Status : FAILED
Error: Found interface org.apache.hadoop.mapreduce.Counter, but class was expected
15/10/01 21:00:47 INFO mapreduce.Job: Task Id : attempt_1443718874432_0002_m_000000_1, Status : FAILED
Error: Found interface org.apache.hadoop.mapreduce.Counter, but class was expected
Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
15/10/01 21:00:55 INFO mapreduce.Job: Task Id : attempt_1443718874432_0002_m_000000_2, Status : FAILED
Error: Found interface org.apache.hadoop.mapreduce.Counter, but class was expected
15/10/01 21:01:03 INFO mapreduce.Job: map 100% reduce 100%
15/10/01 21:01:07 INFO mapreduce.Job: Job job_1443718874432_0002 failed with state FAILED due to: Task failed task_1443718874432_0002_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0
15/10/01 21:01:08 INFO mapreduce.Job: Counters: 12
Job Counters
Failed map tasks=4
Launched map tasks=4
Other local map tasks=3
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=78203
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=78203
Total vcore-seconds taken by all map tasks=78203
Total megabyte-seconds taken by all map tasks=80079872
Map-Reduce Framework
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
How to solve this problem?
Thanks for your interest.
Annotation: Input files are different used in IntelliJ and Single Node Hadoop Cluster