Is the following code for Mappers, reading a text file from HDFS right? And if it is:
- What happens if two mappers in different nodes try to open the file at almost the same time?
- Isn't there a need to close the
InputStreamReader
? If so, how to do it without closing the filesystem?
My code is:
Path pt=new Path("hdfs://pathTofile");
FileSystem fs = FileSystem.get(context.getConfiguration());
BufferedReader br=new BufferedReader(new InputStreamReader(fs.open(pt)));
String line;
line=br.readLine();
while (line != null){
System.out.println(line);
This will work, with some amendments - i assume the code you've pasted is just truncated:
Path pt=new Path("hdfs://pathTofile");
FileSystem fs = FileSystem.get(context.getConfiguration());
BufferedReader br=new BufferedReader(new InputStreamReader(fs.open(pt)));
try {
String line;
line=br.readLine();
while (line != null){
System.out.println(line);
// be sure to read the next line otherwise you'll get an infinite loop
line = br.readLine();
}
} finally {
// you should close out the BufferedReader
br.close();
}
You can have more than one mapper reading the same file, but there is limit at which it makes more sense to use the Distributed Cache (not only reducing the load on the data nodes which host the blocks for the file but also will be more efficient if you have a job with a larger number of tasks than you have task nodes)