Grepping dates that do not satisfy year-pattern YY

2019-09-06 21:32发布

I need to anonymize birth dates in metadata files and redact the month and day fields, e.g., I need to convert 1976-05-25 into 1976-01-01. For backup purposes, I first need to test whether a file contains a non-redacted birth date. I ususally use grep for these tests, like this

if grep -E PATTERN $file > /dev/null; then cp $file /backups/; fi 

However, I struggle to find a nice and elegant pattern for this task. I've tried

grep -E '([12][09][0-9][0-9])-(^(01))-(^(01))'

but it does not accept, e.g., 2001-10-11 or any other date.

I could of course also do something along the lines of

([12][09][0-9][0-9]-0[0-9]-0[^1]|[12][09][0-9][0-9]-0[0-9]-1[0-9]|...)

but this is too complicated and error prone.

Of course, I do not want it to accept dates of the form YYYY-01-01 to avoid a double-backup.

What is a simple (read: elegant) way to grep these dates in a single pattern?

标签: regex grep
1条回答
再贱就再见
2楼-- · 2019-09-06 22:11

Well, I would probably just back it up regardless of content but that's because I have more disk space than time to worry about things like this :-)

However, one approach could be to look at it in reverse. Count the lines in the full file then count the lines containing just the pattern with -01-01.

If they're the same then all the dates are of the -01-01 variety and no backup is needed.

Just be aware you need to watch out if there are multiple dates per line but, in that case, you could use other filters to get just the data you're interested in.

As an example, consider the file infile:

2009-01-01 A very good year
2010-02-01 A moderately good year
2011-01-01 A better year
2012-12-31 Not so good
2013-01-01 Back to normal

You can detect dates at the start of the line of the format you want and count them, comparing that to the full file:

if [[ $(wc -l <infile) -ne $(grep -E '^[0-9]{4}-01-01' infile | wc -l) ]]
then
    echo File needs backing up
fi

One other possibility would be to exclude the 01-01 patterns with the -v option:

pax> grep -Ev '[0-9]{4}-01-01' infile
2010-02-01 A moderately good year
2012-12-31 Not so good

This is relatively easy to detect from an if statement:

if [[ ! -z "$(grep -Ev '^[0-9]{4}-01-01' infile)" ]] ; then
    echo File needs backing up
fi
查看更多
登录 后发表回答