Get the last updated file in HDFS

2020-02-12 05:08发布

I want the latest updated file from one of my HDFS directories. The code should basically loop through the directories and sub directories and the get the latest file path with the file name.I was able to get the latest file in local file system but not sure how to do it for HDFS one.

find /tmp/sdsa -type f -print0 | xargs -0 stat --format '%Y :%y %n' | sort -nr | cut -d: -f2- | head

The above code is working for local file system. I am able to get the date , time and file name from HDFS, but how do I get the latest file using these 3 parameters?

this is the code I tried:

hadoop fs -ls -R /tmp/apps | awk -F" " '{print $6" "$7" "$8}'

Any help will be appreciated.

Thanks in advance.

标签： bash shell unix hadoop

2条回答

太酷不给撩

2楼-- · 2020-02-12 05:20

This one worked for me:

hadoop fs -ls -R /tmp/app | awk -F" " '{print $6" "$7" "$8}' | sort -nr | head -1 | cut -d" " -f3

The output is the entire file path.

0人赞添加讨论(0) 举报

贼婆χ

3楼-- · 2020-02-12 05:34

Here is the command:

hadoop fs -ls -R /user| awk -F" " '{print $6" "$7" "$8}'|sort -nr|head|cut -d" " -f3-

Your script it self is good enough. Hadoop returns the dates in YYYY-MM-DD HH24:MI:SS format and hence you can just sort them alphabetically.

0人赞添加讨论(0) 举报

Get the last updated file in HDFS

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间