可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

In Java code, I want to connect to a directory in HDFS, learn the number of files in that directory, get their names and want to read them. I can already read the files but I couldn't figure out how to count files in a directory and get file names like an ordinary directory.

In order to read I use DFSClient and open files into InputStream.

回答1:

count

Usage: hadoop fs -count [-q] <paths>

Count the number of directories, files and bytes under the paths that match the specified file pattern. The output columns are: DIR_COUNT, FILE_COUNT, CONTENT_SIZE FILE_NAME.

The output columns with -q are: QUOTA, REMAINING_QUATA, SPACE_QUOTA, REMAINING_SPACE_QUOTA, DIR_COUNT, FILE_COUNT, CONTENT_SIZE, FILE_NAME.

Example:

hadoop fs -count hdfs://nn1.example.com/file1 hdfs://nn2.example.com/file2
hadoop fs -count -q hdfs://nn1.example.com/file1

Exit Code:

Returns 0 on success and -1 on error.

You can just use the FileSystem and iterate over the files inside the path. Here is some example code

int count = 0;
FileSystem fs = FileSystem.get(getConf());
boolean recursive = false;
RemoteIterator<LocatedFileStatus> ri = fs.listFiles(new Path("hdfs://my/path"), recursive);
while (ri.hasNext()){
    count++;
    ri.next();
}

回答2:

FileSystem fs = FileSystem.get(conf);
Path pt = new Path("/path");
ContentSummary cs = fs.getContentSummary(pt);
long fileCount = cs.getFileCount();

回答3:

you can also try:

hdfs dfs -ls -R /path/to/your/directory/ | grep -E '^-' | wc -l

回答4:

On command line, you can do it as below.

 hdfs dfs -ls $parentdirectory | awk '{system("hdfs dfs -count " $6) }'

File count in an HDFS directory

问题:

回答1:

回答2:

回答3:

回答4:

收藏的人(0)

File count in an HDFS directory

问题:

回答1:

回答2:

回答3:

回答4:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮