How To Automate Hadoop Trash Cleanup

2019-06-26 09:05发布

问题:

I can clear trash under my user folder by running hadoop fs -expunge This gets rid of files that are older than the fs.trash.interval value. Is there a for expunge to happen automatically to recover diskspace?

Also I see the following output when I run expunge

[cloudera@localhost conf]$ hadoop fs -expunge 14/07/17 15:43:54 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 1 minutes, Emptier interval = 0 minutes.

The emptier interval is 0 which suggests that automated clean up is turned off. Where is this value configured?

回答1:

The code suggests it is called fs.trash.interval.

EDIT: Sorry, misunderstood the question.

The emptier implementation itself is here, where we can see the relevant constant seems to be FS_TRASH_CHECKPOINT_INTERVAL_KEY.

Looking here reveals that key to be fs.trash.checkpoint.interval.

EDIT: Finally found the xml conf entry here.