How to find the size of a HDFS file? What command should be used to find the size of any file in HDFS.
问题:
回答1:
You can use hadoop fs -ls
command to list files in the current directory as well as their details. The 5th column in the command output contains file size in bytes.
For e.g. command hadoop fs -ls input
gives following output:
Found 1 items
-rw-r--r-- 1 hduser supergroup 45956 2012-07-19 20:57 /user/hduser/input/sou
The size of file sou
is 45956 bytes.
回答2:
I also find myself using hadoop fs -dus <path>
a great deal. For example, if a directory on HDFS named "/user/frylock/input" contains 100 files and you need the total size for all of those files you could run:
hadoop fs -dus /user/frylock/input
and you would get back the total size (in bytes) of all of the files in the "/user/frylock/input" directory.
Also, keep in mind that HDFS stores data redundantly so the actual physical storage used up by a file might be 3x or more than what is reported by hadoop fs -ls
and hadoop fs -dus
.
回答3:
I used the below function which helped me to get the file size.
public class GetflStatus
{
public long getflSize(String args) throws IOException, FileNotFoundException
{
Configuration config = new Configuration();
Path path = new Path(args);
FileSystem hdfs = path.getFileSystem(config);
ContentSummary cSummary = hdfs.getContentSummary(path);
long length = cSummary.getLength();
return length;
}
}
回答4:
See the command below with awk script to see the size (in GB) of filtered output in HDFS:
hadoop fs -du -s /data/ClientDataNew/**A*** | awk '{s+=$1} END {printf "%.3fGB\n", s/1000000000}'
output ---> 2.089GB
hadoop fs -du -s /data/ClientDataNew/**B*** | awk '{s+=$1} END {printf "%.3fG\n", s/1000000000}'
output ---> 1.724GB
hadoop fs -du -s /data/ClientDataNew/**C*** | awk '{s+=$1} END {printf "%.3fG\n", s/1000000000}'
output ---> 0.986GB
回答5:
If you want to do it through the API, you can use 'getFileStatus()' method.