I want to change the Output Separator to ; instead of tab. I already tried:
Hadoop: key and value are tab separated in the output file. how to do it semicolon-separated?
but still my Output ist
key (tab) value
I'm using the Cloudera Demo (CDH 4.1.3).
Here is my Code:
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
if (otherArgs.length != 2) {
System.err.println("Usage: Driver <in> <out>");
System.exit(2);
}
conf.set("mapreduce.textoutputformat.separator", ";");
Path in = new Path(otherArgs[0]);
Path out = new Path(otherArgs[1]);
Job job= new Job(getConf());
job.setJobName("MapReduce");
job.setMapperClass(Mapper.class);
job.setReducerClass(Reducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.setInputPaths(job, in);
FileOutputFormat.setOutputPath(job, out);
job.setJarByClass(Driver.class);
job.waitForCompletion(true) ? 0 : 1;
I want
key;value
as my output.
The property is called mapreduce.output.textoutputformat.separator
.
So you are basically missing the output
there.
You can see that in the newest trunk source code found in the Apache SVN.
You should use conf.set("mapred.textoutputformat.separator", ";");
instead of conf.set("mapreduce.textoutputformat.separator", ";");
mapred
and mapreduce
Link
Full code:This is working.
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
if (otherArgs.length != 2) {
System.err.println("Usage: Driver <in> <out>");
System.exit(2);
}
conf.set("mapred.textoutputformat.separator", ";");
Path in = new Path(otherArgs[0]);
Path out = new Path(otherArgs[1]);
Job job= new Job(getConf());
job.setJobName("MapReduce");
job.setMapperClass(Mapper.class);
job.setReducerClass(Reducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.setInputPaths(job, in);
FileOutputFormat.setOutputPath(job, out);
job.setJarByClass(Driver.class);
job.waitForCompletion(true) ? 0 : 1;
In 2017, it's getConf().set(TextOutputFormat.SEPERATOR, ";");
Using native constant provides better maintainability and less surprise I believe.
Important: this property must be set before Job.getInstance(getConf())
/ new Job(getConf())
, as job copies parameters and doesn't care about further conf modifications.