I’ve 2 nodes cluster (v1.04), master and slave. On the master, in Tool.run()
we add two files to the DistributedCache
using addCacheFile()
. Files do exist in HDFS.
In the Mapper.setup() we want to retrieve those files from the cache using
FSDataInputStream fs = FileSystem.get( context.getConfiguration() ).open( path ).
The problem is that for one file a FileNotFoundException
is thrown, although the file exists on the slave node:
attempt_201211211227_0020_m_000000_2: java.io.FileNotFoundException: File does not exist: /somedir/hdp.tmp.dir/mapred/local/taskTracker/distcache/-7769715304990780/master/tmp/analytics/1.csv
ls –l on the slave:
[hduser@slave ~]$ ll /somedir/hdp.tmp.dir/mapred/local/taskTracker/distcache/-7769715304990780/master/tmp/ analytics/1.csv
-rwxr-xr-x 1 hduser hadoop 42701 Nov 22 10:18 /somedir/hdp.tmp.dir/mapred/local/taskTracker/distcache/-7769715304990780/master/tmp/ analytics/1.csv
My questions are:
- Shouldn't all files exist on all nodes?
- What should be done to fix that?
Thanks.