Remove all files older than X days, but keep at le

2020-03-13 08:33发布

I have a script that removes DB dumps that are older than say X=21 days from a backup dir:

DB_DUMP_DIR=/var/backups/dbs
RETENTION=$((21*24*60))  # 3 weeks

find ${DB_DUMP_DIR} -type f -mmin +${RETENTION} -delete

But if for whatever reason the DB dump jobs fails to complete for a while, all dumps will eventually be thrown away. So as a safeguard i want to keep at least the youngest Y=7 dumps, even it all or some of them are older than 21 days.

I look for something that is more elegant than this spaghetti:

DB_DUMP_DIR=/var/backups/dbs
RETENTION=$((21*24*60))  # 3 weeks
KEEP=7

find ${DB_DUMP_DIR} -type f -printf '%T@ %p\n' | \  # list all dumps with epoch
sort -n | \                                         # sort by epoch, oldest 1st
head --lines=-${KEEP} |\                            # Remove youngest/bottom 7 dumps
while read date filename ; do                       # loop through the rest
    find $filename -mmin +${RETENTION} -delete      # delete if older than 21 days
done

(This snippet might have minor bugs - Ignore them. It's to illustrate what i can come up with myself, and why i don't like it)

Edit: The find option "-mtime" is one-off: "-mtime +21" means actually "at least 22 days old". That always confused me, so i use -mmin instead. Still one-off, but only a minute.

7条回答
【Aperson】
2楼-- · 2020-03-13 08:59

Use find to get all files that are old enough to delete, filter out the $KEEP youngest with tail, then pass the rest to xargs.

find ${DB_DUMP_DIR} -type f -printf '%T@ %p\n' -mmin +$RETENTION |
  sort -nr | tail -n +$KEEP |
  xargs -r echo

Replace echo with rm if the reported list of files is the list you want to remove.

(I assume none of the dump files have newlines in their names.)

查看更多
萌系小妹纸
3楼-- · 2020-03-13 09:01

Here is a BASH function that should do the trick. I couldn't avoid two invocations of find easily, but other than that, it was a relative success:

#  A "safe" function for removing backups older than REMOVE_AGE + 1 day(s), always keeping at least the ALWAYS_KEEP youngest
remove_old_backups() {
    local file_prefix="${backup_file_prefix:-$1}"
    local temp=$(( REMOVE_AGE+1 ))  # for inverting the mtime argument: it's quirky ;)
    # We consider backups made on the same day to be one (commonly these are temporary backups in manual intervention scenarios)
    local keeping_n=`/usr/bin/find . -maxdepth 1 \( -name "$file_prefix*.tgz" -or -name "$file_prefix*.gz" \) -type f -mtime -"$temp" -printf '%Td-%Tm-%TY\n' | sort -d | uniq | wc -l`
    local extra_keep=$(( $ALWAYS_KEEP-$keeping_n ))

    /usr/bin/find . -maxdepth 1 \( -name "$file_prefix*.tgz" -or -name "$file_prefix*.gz" \) -type f -mtime +$REMOVE_AGE -printf '%T@ %p\n' |  sort -n | head -n -$extra_keep | cut -d ' ' -f2 | xargs -r rm
}

It takes a backup_file_prefix env variable or it can be passed as the first argument and expects enviroment variables ALWAYS_KEEP (minimum number of files to keep) and REMOVE_AGE (num days to pass to -mtime). It expects a gz or tgz extension. There are a few other assumptions as you can see in the comments, mostly in the name of safety.

Thanks to ireardon and his answer (which doesn't quite answer the question) for the inspiration!

Happy safe backup management :)

查看更多
Ridiculous、
4楼-- · 2020-03-13 09:05

You can use -mtime instead of -mmin which means you don't have to calculate the number of minutes in a day:

find $DB_DUMP_DIR -type f -mtime +21

Instead of deleting them, you could use stat command to sort the files in order:

find $DB_DUMP_DIR -type f -mtime +21 | while read file
do
    stat -f "%-10m %40N" $file
done | sort | awk 'NR > 7 {print $2}'

This will list all files older than 21 days, but not the seven youngest that are older than 21 days.

From there, you could feed this into xargs to do the remove:

find $DB_DUMP_DIR -type f -mtime +21 | while read file
do
    stat -f "%-10m %40N" $file
done | sort | awk 'NR > 7 {print $2]' | xargs rm

Of course, this is all assuming that you don't have spaces in your file names. If you do, you'll have to take a slightly different tack.

This will also keep the seven youngest files over 21 days old. You might have files younger than that, and don't want to really keep those. However, you could simply run the same sequence again (except remove the -mtime parameter:

find $DB_DUMP_DIR -type f |  while read file
do
    stat -f "%-10m %40N" $file
done | sort | awk 'NR > 7 {print $2} | xargs rm

You need to look at your stat command to see what the options are for the format. This varies from system to system. The one I used is for OS X. Linux is different.


Let's take a slightly different approach. I haven't thoroughly tested this, but:

If all of the files are in the same directory, and none of the file names have whitespace in them:

ls -t | awk 'NR > 7 {print $0}'

Will print out all of the files except for the seven youngest files. Maybe we can go with that?

current_seconds=$(date +%S)   # Seconds since the epoch
((days = 60 * 60 * 24 * 21))  # Number of seconds in 21 days
((oldest_allowed = $current_seconds - $days)) # Oldest allowed file
ls -t | awk 'NR > 7 {print $0}' | stat -f "%Dm %N" $file | while date file
do
    [ $date < $oldest_allowed ] || rm $file
done

The ls ... | awk will shave off the seven youngest. After that, we can take stat to get the name of the file and the date. Since the date is seconds after the epoch, we had to calculate what 21 days prior to the current time would be in seconds before the epoch.

After that, it's pretty simple. We look at the date of the file. If it's older than 21 days before the epoch (i.e., it's timestamp is lower) we can delete it.

As I said, I haven't thoroughly tested this, but this will delete all files over 21 days, and only files over 21 days, but always keep the seven youngest.

查看更多
再贱就再见
5楼-- · 2020-03-13 09:06

I'm opening a second answer because I just I have a different solution - one using awk: just add the time to the 21 day (in seconds) period, minus the current time and remove the negative ones! (after sorting and removing the newest 7 from the list):

DB_DUMP_DIR=/var/backups/dbs
RETENTION=21*24*60*60  # 3 weeks
CURR_TIME=`date +%s`

find ${DB_DUMP_DIR} -type f -printf '%T@ %p\n' | \
  awk '{ print int($1) -'${CURR_TIME}' + '${RETENTION}' ":" $2}' | \
  sort -n | head -n -7 | grep '^-' | cut -d ':' -f 2- | xargs rm -rf
查看更多
太酷不给撩
6楼-- · 2020-03-13 09:16

You could do the loop yourself:

t21=$(date -d "21 days ago" +%s)
cd "$DB_DUMP_DIR"
for f in *; do
    if (( $(stat -c %Y "$f") <= $t21 )); then
        echo rm "$f"
    fi
done

I'm assuming you have GNU date

查看更多
孤傲高冷的网名
7楼-- · 2020-03-13 09:23

None of these answers quite worked for me, so I adapted chepner's answer and came to this, which simply retains the last $KEEP backups.

find ${DB_DUMP_DIR} -printf '%T@ %p\n' | # print entries with creation time
  sort -n |                              # sort in date-ascending order
  head -n -$KEEP |                       # remove the $KEEP most recent entries
  awk '{ print $2 }' |                   # select the file paths
  xargs -r rm                            # remove the file paths

I believe chepner's code retains the $KEEP oldest, rather than the youngest.

查看更多
登录 后发表回答