HDFS: move multiple files using Java / Scala API

2019-03-03 16:00发布

问题:

I need to move multiple files in HDFS, that correspond to a given regular expression, using a Java / Scala program. For example, I have to move all files with name *.xml from folder a to folder b.

Using a shell command I can use the following:

bin/hdfs dfs -mv a/*.xml b/

I can move a single file using Java API, with the following code (scala language), using the rename method on FileSystem class:

// Prepare initial configuration
val conf = new Configuration()
conf.set("fs.defaultFS", "hdfs://hdfs:9000/user/root")
val fs = FileSystem.get(conf)
// Move a single file
val ok = fs.rename(new Path("a/file.xml"), new Path("b/file.xml"));

As far as I know the Path class represents an URI. Then, I can't use in the following way:

val ok = fs.rename(new Path("a/*.xml"), new Path("b/"));

Is there a way to move a set of file in HDFS via Java / Scala API?

回答1:

You can use fs.rename(new Path("a"), new Path("b"))

But if you want to have *.xml there are filter files like globfilter.

FileSystem fs = FileSystem.get(URI.create(arg0[0]), conf);
Path path = new Path(arg0[0] + arg0[1]); // arg0[1] NYSE_201[2-3]
//arg0[0] is base path
//ar0[1] uses regular expression

FileStatus[] status = fs.globStatus(path);
Path[] paths = FileUtil.stat2Paths(status);
for (Path p : paths) {
    // <loops all the source paths>
    // <need to implement logic to rename the paths using fs.rename>
}