In a java app running on an edge node, I need to delete a hdfs folder, if it exists. I need to do that before running a mapreduce job (with spark) that output in the folder.
I found I could use the method
org.apache.hadoop.fs.FileUtil.fullyDelete(new File(url))
However, I can only make it work with local folder (i.e. file url on the running computer). I tried to use something like:
url = "hdfs://hdfshost:port/the/folder/to/delete";
with hdfs://hdfshost:port
being the hdfs namenode IPC. I use it for the mapreduce, so it is correct.
However it doesn't do anything.
So, what url should I use, or is there another method?
Note: here is the simple project in question.
I do it this way:
Configuration conf = new Configuration();
conf.set("fs.hdfs.impl",org.apache.hadoop.hdfs.DistributedFileSystem.class.getName());
conf.set("fs.file.impl",org.apache.hadoop.fs.LocalFileSystem.class.getName());
FileSystem hdfs = FileSystem.get(URI.create("hdfs://<namenode-hostname>:<port>"), conf);
hdfs.delete("/path/to/your/file", isRecursive);
you don't need hdfs://hdfshost:port/
in your file path
This site works for me.
Just add the following codes in my WordCount program will do:
import org.apache.hadoop.fs.*;
...
Configuration conf = new Configuration();
Path output = new Path("/the/folder/to/delete");
FileSystem hdfs = FileSystem.get(conf);
// delete existing directory
if (hdfs.exists(output)) {
hdfs.delete(output, true);
}
Job job = Job.getInstance(conf, "word count");
...
You do not need to add hdfs://hdfshost:port
explicitly.