Remove all files older than X days, but keep at le

2020-03-13 08:33发布

I have a script that removes DB dumps that are older than say X=21 days from a backup dir:

DB_DUMP_DIR=/var/backups/dbs
RETENTION=$((21*24*60))  # 3 weeks

find ${DB_DUMP_DIR} -type f -mmin +${RETENTION} -delete

But if for whatever reason the DB dump jobs fails to complete for a while, all dumps will eventually be thrown away. So as a safeguard i want to keep at least the youngest Y=7 dumps, even it all or some of them are older than 21 days.

I look for something that is more elegant than this spaghetti:

DB_DUMP_DIR=/var/backups/dbs
RETENTION=$((21*24*60))  # 3 weeks
KEEP=7

find ${DB_DUMP_DIR} -type f -printf '%T@ %p\n' | \  # list all dumps with epoch
sort -n | \                                         # sort by epoch, oldest 1st
head --lines=-${KEEP} |\                            # Remove youngest/bottom 7 dumps
while read date filename ; do                       # loop through the rest
    find $filename -mmin +${RETENTION} -delete      # delete if older than 21 days
done

(This snippet might have minor bugs - Ignore them. It's to illustrate what i can come up with myself, and why i don't like it)

Edit: The find option "-mtime" is one-off: "-mtime +21" means actually "at least 22 days old". That always confused me, so i use -mmin instead. Still one-off, but only a minute.

7条回答
兄弟一词,经得起流年.
2楼-- · 2020-03-13 09:23

From the solutions given in the other solutions, I've experimented and found many bugs or situations that were not wanted.

Here is the solution I finally came up with :

  # Sample variable values
  BACKUP_PATH='/data/backup'
  DUMP_PATTERN='dump_*.tar.gz'
  NB_RETENTION_DAYS=10
  NB_KEEP=2                    # keep at least the 2 most recent files in all cases

  find ${BACKUP_PATH} -name ${DUMP_PATTERN} \
    -mtime +${NB_RETENTION_DAYS} > /tmp/obsolete_files

  find ${BACKUP_PATH} -name ${DUMP_PATTERN} \
    -printf '%T@ %p\n' | \
    sort -n            | \
    tail -n ${NB_KEEP} | \
    awk '{ print $2 }'   > /tmp/files_to_keep

  grep -F -f /tmp/files_to_keep -v /tmp/obsolete_files > /tmp/files_to_delete

  cat /tmp/files_to_delete | xargs -r rm

The ideas are :

  • Most of the time, I just want to keep files that are not aged more than NB_RETENTION_DAYS.
  • However, shit happens, and when for some reason there are no recent files anymore (backup scripts are broken), I don't want to remove the NB_KEEP more recent ones, for security (NB_KEEP should be at least 1).

I my case, I have 2 backups a day, and set NB_RETENTION_DAYS to 10 (thus, I normally have 20 files in normal situation) One could think that I would thus set NB_KEEP=20, but in fact, I chose NB_KEEP=2, and that's why :

Let's imagine my backup scripts are broken, and I don't have backup for a month. I really don't care having my 20 latest files that are more than 30 days old. Having at least one is what I want. However, being able to easily identify that there is a problem is very important (obviously my monitoring system is really blind, but that's another point). And having my backup folder having 10 times less files than usual is maybe something that could ring a bell...

查看更多
登录 后发表回答