It shows that it created cached files. But, when I go and look at the location the file is not present and when I am trying to read from my mapper it shows the File Not Found Exception.
This is the code that I am trying to run:
JobConf conf2 = new JobConf(getConf(), CorpusCalculator.class);
conf2.setJobName("CorpusCalculator2");
//Distributed Caching of the file emitted by the reducer2 is done here
conf2.addResource(new Path("/opt/hadoop1/conf/core-site.xml"));
conf2.addResource(new Path("/opt/hadoop1/conf/hdfs-site.xml"));
//cacheFile(conf2, new Path(outputPathofReducer2));
conf2.setNumReduceTasks(1);
//conf2.setOutputKeyComparatorClass()
conf2.setMapOutputKeyClass(FloatWritable.class);
conf2.setMapOutputValueClass(Text.class);
conf2.setOutputKeyClass(Text.class);
conf2.setOutputValueClass(Text.class);
conf2.setMapperClass(MapClass2.class);
conf2.setReducerClass(Reduce2.class);
FileInputFormat.setInputPaths(conf2, new Path(inputPathForMapper1));
FileOutputFormat.setOutputPath(conf2, new Path(outputPathofReducer3));
DistributedCache.addCacheFile(new Path("/sunilFiles/M51.txt").toUri(),conf2);
JobClient.runJob(conf
Logs:
13/04/27 04:43:40 INFO filecache.TrackerDistributedCacheManager: Creating M51.txt in /tmp1/mapred/local/archive/-1731849462204707023_-2090562221_1263420527/localhost/sunilFiles-work-2204204368663038938 with rwxr-xr-x
13/04/27 04:43:40 INFO filecache.TrackerDistributedCacheManager: Cached /sunilFiles/M51.txt as /tmp1/mapred/local/archive/-1731849462204707023_-2090562221_1263420527/localhost/sunilFiles/M51.txt
13/04/27 04:43:40 INFO filecache.TrackerDistributedCacheManager: Cached /sunilFiles/M51.txt as /tmp1/mapred/local/archive/-1731849462204707023_-2090562221_1263420527/localhost/sunilFiles/M51.txt
13/04/27 04:43:40 INFO mapred.JobClient: Running job: job_local_0003
13/04/27 04:43:40 INFO mapred.Task: Using ResourceCalculatorPlugin : o
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@8c2df1
13/04/27 04:43:40 INFO mapred.MapTask: numReduceTasks: 1
13/04/27 04:43:40 INFO mapred.MapTask: io.sort.mb = 100
13/04/27 04:43:40 INFO mapred.MapTask: data buffer = 79691776/99614720
13/04/27 04:43:40 INFO mapred.MapTask: record buffer = 262144/327680
inside configure()
:
Exception reading DistribtuedCache: java.io.FileNotFoundException: /tmp1/mapred/local/archive/-1731849462204707023_-2090562221_1263420527/localhost/sunilFiles/M51.txt (Is a directory)
Inside setup(): /tmp1/mapred/local/archive/-1731849462204707023_-2090562221_1263420527/localhost/sunilFiles/M51.txt
13/04/27 04:43:41 WARN mapred.LocalJobRunner: job_local_0003
Please help me out, I have been searching solution for this for last 6 hours continuously and tomorrow I have an assignment submission. Thank you very much.
You might want to try -files option which is simpler.To be able to use it, driver class need to extend Configured and implement Tool.
Eg.,
In mapper or reducer:
I solved this problem by using copyMerge() Property which merges all the files that are present in various machines into a single file and that file I was successfully able to use..if I am using normal file it is failing. thanks for your replies guys.