Why doesn't this FINDSTR example with multiple

2019-01-01 09:30发布

问题:

The following FINDSTR example fails to find a match.

echo ffffaaa|findstr /l \"ffffaaa faffaffddd\"

Why?

回答1:

Apparantly this is a long standing FINDSTR bug. I think it can be a crippling bug, depending on the circumstances.

I have confirmed the command fails on two different Vista machines, a Windows 7 machine, and an XP machine. I found this findstr - broken ??? link that reports a similar search fails on Windows Server 2003, but it succeeds on Windows 2000.

I\'ve done a number of experiments and it seems all of the following conditions must be met for the potential of a failure:

  • The search is using multiple literal search strings
  • The search strings are of different lengths
  • A short search string has some amount of overlap with a longer search string
  • The search is case sensitive (no /I option)

In every failure I have seen, it is always one of the shorter search strings that fails.

It does not matter how the search strings are specified. The same faulty result is achieved using multiple /C:\"search\" options and also with the /G:file option.

The only 3 workarounds I have been able to come up with are:

  • Use the /I option if you don\'t care about case. Obviously this might not meet your needs.

  • Use the /R regular expression option. But if you do then you have to make sure you escape any meta-characters in the search so that it matches the result expected of a literal search. This can be problematic as well.

  • If you are using the /V option, then use multiple piped FINDSTR commands with one search string each instead of one FINDSTR with multiple searches. This also can be a problem if you have a lot of search strings for which you want to use the /G:file option.

I hate this bug!!!!

Note - See What are the undocumented features and limitations of the Windows FINDSTR command? for a comprehensive list of FINDSTR idiosyncrasies.



回答2:

I cannot tell why findstr may fail with multiple literal strings. However, I can provide a method to work around that annoying bug.

Given that the literal search strings are listed in a text file called search_strings.txt...:

ffffaaa
faffaffddd

..., you can convert it to regular expressions by inserting a backslash in front of every single character:

@echo off
setlocal EnableExtensions DisableDelayedExpansion
> \"regular_expressions.txt\" (
    for /F usebackq^ delims^=^ eol^= %%S in (\"search_strings.txt\") do (
        set \"REGEX=\" & set \"STRING=%%S\"
        for /F delims^=^ eol^= %%T in (\'
            cmd /U /V /C echo(!STRING!^| find /V \"\"
        \') do (
            set \"ESCCHR=\\%%T\"
            if \"%%T\"=\"<\" (set \"ESCCHR=%%T\") else if \"%%T\"=\">\" (set \"ESCCHR=%%T\")
            setlocal EnableDelayedExpansion
            for /F \"delims=\" %%U in (\"REGEX=!REGEX!!ESCCHR!\") do (
                endlocal & set \"%%U\"
            )
        )
        setlocal EnableDelayedExpansion
        echo(!REGEX!
        endlocal
    )
)
endlocal

Then use the converted file regular_expressions.txt...:

\\f\\f\\f\\f\\a\\a\\a
\\f\\a\\f\\f\\a\\f\\f\\d\\d\\d

...to do a regular expression search, which seems to work fine also with multiple search strings:

echo ffffaaa| findstr /R /G:\"regular_expressions.txt\"

The preceding backslashes simply escape every character including those that have a particular meaning in regular expression searches.

The characters < and > are excluded from being escaped in order to avoid conflicts with word boundaries, which were expressed by \\< and \\> when appearing at the beginning and at the end of a search string, respectively.

Since regular expressions are limited to 254 characters for findstr versions past Windows XP (opposed to literal strings, which are limited to 511 characters), the length of the original search strings is limited to 127 characters, because every such character is expressed by two characters due to the escaping.


Here is an alternative approach that only escapes the meta-characters ., *, ^, $, [, ], \\, \":

@echo off
setlocal EnableExtensions DisableDelayedExpansion
set \"_META=.*^$[]\\\"^\" & rem (including `\"`)
> \"regular_expressions.txt\" (
    for /F usebackq^ delims^=^ eol^= %%S in (\"search_strings.txt\") do (
        set \"REGEX=\" & set \"STRING=%%S\"
        for /F delims^=^ eol^= %%T in (\'
            cmd /U /V /C echo(!STRING!^| find /V \"\"
        \') do (
            set \"CHR=%%T\"
            setlocal EnableDelayedExpansion
            if not \"!_META!\"==\"!_META:*%%T=!\" set \"CHR=\\!CHR!\"
            for /F \"delims=\" %%U in (\"REGEX=!REGEX!!CHR!\") do (
                endlocal & set \"%%U\"
            )
        )
        setlocal EnableDelayedExpansion
        echo(!REGEX!
        endlocal
    )
)
endlocal

The advantage of this method is that the length of the search strings is no longer limited to 127 characters but to 254 characters minus 1 for every occurring aforementioned meta-character, applying for findstr versions past Windows XP.


Here is another work-around, using a case-insensitive search with findstr at the first place, then post-filtering the result by case-sensitive comparisons:

echo ffffaaa|findstr /L /I \"ffffaaa faffaffddd\"|cmd /V /C set /P STR=\"\"^&if @^^!STR^^!==@^^!STR:ffffaaa=ffffaaa^^! (echo(^^!STR^^!) else if @^^!STR^^!==@^^!STR:faffaffddd=faffaffddd^^! (echo(^^!STR^^!)

The double-escaped exclamation marks ensure the variable STR is expanded in the explicitly invoked cmd instance even in case delayed expansion is enabled in the hosting cmd instance.


By the way, due to what I call a design flaw, searches with literal strings using findstr never work reliably as soon as they contain backslashes, because such may still be consumed to escape following meta-characters, although not necessary; for example, the search string \\. actually matches .; to truly match \\. literally, you must specify the search string \\\\.. I do not understand why meta-characters are still recognised when doing literal searches, that is not what I call literal.