I have a script which looks for a file using a regular expression. The code was the following:
find $dir | grep "$regex"
The script run a bit too slow and I want to optimise it. The search takes some time to perform and I would like to get better performance out of it. I've tried this attempt:
find $dir -regex ".*${regex}.*"
I was expecting slightly faster results as no extra process is created to parse the regular expression.
However the result was different and to my astonishment the command "find | grep" is faster than "find -regex" (although it takes more system time, as one would have expected)
I've timed this behaviour:
Find | grep result
real 0m12.467s
user 0m2.568s
sys 0m7.260s
Find -regex result
real 0m16.778s
user 0m6.772s
sys 0m6.380s
Do you have any idea why the find -regex solution is slower?
Most likely because
grep
and its regex engine has been highly optimized over many years, since that's its only purpose ("do one thing and do it well"). I don't know what regex enginefind
uses, but it's evidently not as highly refined asgrep
's, probably because it's a less-often-used secondary feature.Also, if you are doing anything with this file list, you should really use a more whitespace-safe way of doing this. I don't think
grep
can take null-delimited input (though it can output it), so you should usefind [...] -regex [...] -print0
even though it's slower.