Find -regex is slower than find | grep

2019-07-20 22:21发布

问题:

I have a script which looks for a file using a regular expression. The code was the following:

find $dir | grep "$regex"

The script run a bit too slow and I want to optimise it. The search takes some time to perform and I would like to get better performance out of it. I've tried this attempt:

find $dir -regex ".*${regex}.*"

I was expecting slightly faster results as no extra process is created to parse the regular expression.

However the result was different and to my astonishment the command "find | grep" is faster than "find -regex" (although it takes more system time, as one would have expected)

I've timed this behaviour:

Find | grep result

real   0m12.467s
user   0m2.568s
sys    0m7.260s

Find -regex result

real   0m16.778s
user   0m6.772s
sys    0m6.380s

Do you have any idea why the find -regex solution is slower?

回答1:

Most likely because grep and its regex engine has been highly optimized over many years, since that's its only purpose ("do one thing and do it well"). I don't know what regex engine find uses, but it's evidently not as highly refined as grep's, probably because it's a less-often-used secondary feature.

Also, if you are doing anything with this file list, you should really use a more whitespace-safe way of doing this. I don't think grep can take null-delimited input (though it can output it), so you should use find [...] -regex [...] -print0 even though it's slower.