I am writing hadoop programs , and i really dont want to play with deprecated classes . Anywhere online i am not able to find programs with updated
org.apache.hadoop.conf.Configuration
class insted of
org.apache.hadoop.mapred.JobConf
class.
public static void main(String[] args) throws Exception {
JobConf conf = new JobConf(Test.class);
conf.setJobName("TESST");
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
conf.setMapperClass(Map.class);
conf.setCombinerClass(Reduce.class);
conf.setReducerClass(Reduce.class);
conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);
FileInputFormat.setInputPaths(conf, new Path(args[0]));
FileOutputFormat.setOutputPath(conf, new Path(args[1]));
JobClient.runJob(conf);
}
This is how my main() looks like. Can please anyone will provide me with updated function.
Also take classic WordCount as example:
org.apache.hadoop.mapred.JobConf
is old, in new version we useConfiguration
andJob
to achieve.Please use
org.apache.hadoop.mapreduce.lib.*
(it is new API) instead oforg.apache.hadoop.mapred.TextInputFormat
(it is old).The
Mapper
andReducer
are nothing new, please seemain
function, it includes relatively overall configurations, feel free to change them according to your specific requirements.Here it's the classic WordCount example. You'll notice a tone of other imports that may not be necessary, reading the code you'll figure out which is which.
What's different? I'm using the Tool interface and the GenericOptionParser to parse the job command a.k.a : hadoop jar ....
In the mapper you'll notice a run thing. You can get rid of that, it's usually called by default when you supply the code for the Map method. I put it there to give you the info that you can further control the mapping stage. This is all using the new API. I hope you find it useful. Any other questions, let me know!