I want to read file paths irrespective of whether they are HDFS or local. Currently, I pass the local paths with the prefix file:// and HDFS paths with the prefix hdfs:// and write some code as the following
Configuration configuration = new Configuration();
FileSystem fileSystem = null;
if (filePath.startsWith("hdfs://")) {
fileSystem = FileSystem.get(configuration);
} else if (filePath.startsWith("file://")) {
fileSystem = FileSystem.getLocal(configuration).getRawFileSystem();
}
From here I use the API's of the FileSystem to read the file.
Can you please let me know if there is any other better way than this?
You can get the
FileSystem
by the following way:You do not need to judge if the path starts with
hdfs://
orfile://
. This API will do the work.Please check the code snippet below that list files from HDFS path; namely the path string that starts with
hdfs://
. If you can provide Hadoop configuration and local path it will also list files from local file system; namely the path string that starts withfile://
.If you really want to work with java.io.File API then the following method will help you list files only from local file system; namely path string that starts with
file://
.Does this make sense,
You don't have to put that check if you go this way. Get the FileSystem directly from Path and then do whatever you feel like.