I need to anonymize birth dates in metadata files and redact the month and day fields, e.g., I need to convert 1976-05-25
into 1976-01-01
. For backup purposes, I first need to test whether a file contains a non-redacted birth date. I ususally use grep for these tests, like this
if grep -E PATTERN $file > /dev/null; then cp $file /backups/; fi
However, I struggle to find a nice and elegant pattern for this task. I've tried
grep -E '([12][09][0-9][0-9])-(^(01))-(^(01))'
but it does not accept, e.g., 2001-10-11
or any other date.
I could of course also do something along the lines of
([12][09][0-9][0-9]-0[0-9]-0[^1]|[12][09][0-9][0-9]-0[0-9]-1[0-9]|...)
but this is too complicated and error prone.
Of course, I do not want it to accept dates of the form YYYY-01-01
to avoid a double-backup.
What is a simple (read: elegant) way to grep these dates in a single pattern?
Well, I would probably just back it up regardless of content but that's because I have more disk space than time to worry about things like this :-)
However, one approach could be to look at it in reverse. Count the lines in the full file then count the lines containing just the pattern with
-01-01
.If they're the same then all the dates are of the
-01-01
variety and no backup is needed.Just be aware you need to watch out if there are multiple dates per line but, in that case, you could use other filters to get just the data you're interested in.
As an example, consider the file
infile
:You can detect dates at the start of the line of the format you want and count them, comparing that to the full file:
One other possibility would be to exclude the
01-01
patterns with the-v
option:This is relatively easy to detect from an
if
statement: