How to use FINDSTR in PowerShell to find lines whe

2020-07-18 06:43发布

问题:

The following findstr.exe command almost does what I want, but not quite:

findstr /s /i /c:"word1 word2 word3" *.abc

I have used:

  • /s for searching all subfolders.
  • /c:

    Uses specified text as a literal search string

  • /i Specifies that the search is not to be case-sensitive.
  • *.abc Files of type abc.

The above looks for word1 word2 word3 as a literal, and therefore only finds the words in that exact order.

By contrast, I want all words to match individually, in any order (AND logic, conjunction).

If I remove /c: from the command above, then lines matching any of the words are returned (OR logic, disjunction), which is not what I want.

Can this be done in PowerShell?

回答1:

You can use Select-String to do a regex based search through multiple files.

To match all of multiple search terms in a single string with regular expressions, you'll have to use a lookaround assertion:

Get-ChildItem -Filter *.abc -Recurse |Select-String -Pattern '^(?=.*\bword1\b)(?=.*\bword2\b)(?=.*\bword3\b).*$'

In the above example, this is what's happening with the first command:

Get-ChildItem -Filter *.abc -Recurse

Get-ChildItem searches for files in the current directory
-Filter *.abc shows us only files ending in *.abc
-Recurse searches all subfolders

We then pipe the resulting FileInfo objects to Select-String and use the following regex pattern:

^(?=.*\bword1\b)(?=.*\bword2\b)(?=.*\bword3\b).*$
^              # start of string  
 (?=           # open positive lookahead assertion containing
    .*         # any number of any characters (like * in wildcard matching)
      \b       # word boundary
        word1  # the literal string "word1"
      \b       # word boundary
 )             # close positive lookahead assertion
 ...           # repeat for remaining words
 .*            # any number of any characters
$              # end of string

Since each lookahead group is just being asserted for correctness and the search position within the string never changes, the order doesn't matter.


If you want it to match strings that contain any of the words, you can use a simple non-capturing group:

Get-ChildItem -Filter *.abc -Recurse |Select-String -Pattern '\b(?:word1|word2|word3)\b'
\b(?:word1|word2|word3)\b
\b          # start of string  
  (?:       # open non-capturing group
     word1  # the literal string "word1"
     |      # or
     word2  # the literal string "word2"
     |      # or
     word3  # the literal string "word3"
  )         # close positive lookahead assertion
\b          # end of string

These can of course be abstracted away in a simple proxy function.

I generated the param block and most of the body of the Select-Match function definition below with:

$slsmeta = [System.Management.Automation.CommandMetadata]::new((Get-Command Select-String))
[System.Management.Automation.ProxyCommand]::Create($slsmeta)

Then removed unnecessary parameters (including -AllMatches and -Pattern), then added the pattern generator (see inline comments):

function Select-Match
{
    [CmdletBinding(DefaultParameterSetName='Any', HelpUri='http://go.microsoft.com/fwlink/?LinkID=113388')]
    param(
        [Parameter(Mandatory=$true, Position=0)]
        [string[]]
        ${Substring},

        [Parameter(Mandatory=$true, ValueFromPipelineByPropertyName=$true)]
        [Alias('PSPath')]
        [string[]]
        ${LiteralPath},

        [Parameter(ParameterSetName='Any')]
        [switch]
        ${Any},

        [Parameter(ParameterSetName='Any')]
        [switch]
        ${All},

        [switch]
        ${CaseSensitive},

        [switch]
        ${NotMatch},

        [ValidateNotNullOrEmpty()]
        [ValidateSet('unicode','utf7','utf8','utf32','ascii','bigendianunicode','default','oem')]
        [string]
        ${Encoding},

        [ValidateNotNullOrEmpty()]
        [ValidateCount(1, 2)]
        [ValidateRange(0, 2147483647)]
        [int[]]
        ${Context}
    )

    begin
    {
        try {
            $outBuffer = $null
            if ($PSBoundParameters.TryGetValue('OutBuffer', [ref]$outBuffer))
            {
                $PSBoundParameters['OutBuffer'] = 1
            }

            # Escape literal input strings
            $EscapedStrings = foreach($term in $PSBoundParameters['Substring']){
                [regex]::Escape($term)
            }

            # Construct pattern based on whether -Any or -All was specified 
            if($PSCmdlet.ParameterSetName -eq 'Any'){
                $Pattern = '\b(?:{0})\b' -f ($EscapedStrings -join '|')
            } else {
                $Clauses = foreach($EscapedString in $EscapedStrings){
                    '(?=.*\b{0}\b)' -f $_
                }
                $Pattern = '^{0}.*$' -f ($Clauses -join '')
            }

            # Remove the Substring parameter argument from PSBoundParameters
            $PSBoundParameters.Remove('Substring') |Out-Null

            # Add the Pattern parameter argument
            $PSBoundParameters['Pattern'] = $Pattern

            $wrappedCmd = $ExecutionContext.InvokeCommand.GetCommand('Microsoft.PowerShell.Utility\Select-String', [System.Management.Automation.CommandTypes]::Cmdlet)
            $scriptCmd = {& $wrappedCmd @PSBoundParameters }
            $steppablePipeline = $scriptCmd.GetSteppablePipeline($myInvocation.CommandOrigin)
            $steppablePipeline.Begin($PSCmdlet)
        } catch {
            throw
        }
    }

    process
    {
        try {
            $steppablePipeline.Process($_)
        } catch {
            throw
        }
    }

    end
    {
        try {
            $steppablePipeline.End()
        } catch {
            throw
        }
    }
    <#

    .ForwardHelpTargetName Microsoft.PowerShell.Utility\Select-String
    .ForwardHelpCategory Cmdlet

    #>

}

Now you can use it like this, and it'll behave almost like Select-String:

Get-ChildItem -Filter *.abc -Recurse |Select-Match word1,word2,word3 -All


回答2:

Another (admittedly less sophisticated) approach would be to simply daisy-chain filters, since the order of the words doesn't matter. Filter your files for one word first, then filter the output for lines that also contain the second word, then filter that output for lines that also containt the third word.

findstr /s /i "word1" *.abc | findstr /i "word2" | findstr /i "word3"

Using PowerShell cmdlets the above would look like this:

Get-ChildItem -Filter '*.abc' -Recurse | Get-Content | Where-Object {
  $_ -like '*word1*' -and
  $_ -like '*word2*' -and
  $_ -like '*word3*'
}

or (using aliases):

ls '*.abc' -r | cat | ? {
  $_ -like '*word1*' -and
  $_ -like '*word2*' -and
  $_ -like '*word3*'
}

Note that aliases are just to save time typing on the commandline, so I do not recommend using them in scripts.



回答3:

Note:

  • The first part of this answer does not solve the OP's problem - for solutions, see Mathias R. Jessen's helpful answer and Ansgar Wiecher's helpful answer; alternatively, see the bottom of this answer, which offers a generic solution adapted from Mathias' code.

    • (Due to an initial misreading of the question), this part of the answer uses disjunctive logic - matching lines that have at least one matching search term - which is the only logic that findstr.exe and PowerShell's Select-String (directly) support.

    • By contrast, the OP is asking for conjunctive logic, which requires additional work.

  • This part of the answer may still be of interest with respect to translating findstr.exe commands to PowerShell, using Select-String.


The PowerShell equivalent of the findstr command from the question, but without /c: -
FINDSTR /s /i "word1 word2 word3" *.abc
- is:

(Get-ChildItem -File -Filter *.abc -Recurse |
  Select-String -SimpleMatch -Pattern 'word1', 'word2', 'word3').Count
  • /s -> Get-ChildItem -File -Filter *.abc -Recurse outputs all files in the current directory subtree matching *.abc

    • Note that wile Select-String is capable of accepting a filename pattern (wildcard expression) such as *.abc, it doesn't support recursion, so the separate Get-ChildItem call is needed, whose output is piped to Select-String.
  • findstr -> Select-String, PowerShell's more flexible counterpart:

    • -SimpleMatch specifies that the -Pattern argument(s) be interpreted as literals rather than as regexes (regular expressions). Note how they defaults differ:

      • findstr expects literals by default (you can switch to regexes with /R).
      • Select-String expects regexes by default (you can switch to literal with -SimpleMatch).
    • -i -> (default behavior); like most of PowerShell, case-insensitivity is Select-String's default behavior - add -CaseSensitive to change that.

    • "word1 word2 word3" -> -Pattern 'word1', 'word2', 'word3'; specifying an array of patterns looks for a match for at least one of the patterns on each line (disjunctive logic).

      • That is, all of the following lines would match: ... word1 ..., ... word2 ..., ... word2 word1 ..., ... word3 word1 word2 ...
  • /c -> (...).Count: Select-String outputs a collection of objects representing the matching lines, which this expression simply counts. The objects output are [Microsoft.PowerShell.Commands.MatchInfo] instances, which not only include the matching line, but metadata about the input and the specifics of what matched.


A solution, building on Mathias R. Jessen's elegant wrapper function:

Select-AllStrings is a conjunctive-only companion function to the disjunctive-only Select-String cmdlet that uses the exact same syntax as the latter, with the exception of not supporting the -AllMatches switch.

That is, Select-AllStrings requires that all patterns passed to it - whether they're regexes (by default) or literals (with -SimpleMatch) - match the line.

Applied to the OP's problem, we get:

(Get-ChildItem -File -Filter *.abc -Recurse |
  Select-AllStrings -SimpleMatch word1, word2, word3).Count

Note the variations compared to the command at the top:
* The -Pattern parameter is implicitly bound, by argument position.
* The patterns are specified as barewords (unquoted) for convenience, though it's generally safer to quote, because it's not easy to remember what needs quoting.



回答4:

The following will work if you DO NOT HAVE ANY OF THE WORDS REPEATED IN THE SAME LINE as: word1 hello word1 bye word1

findstr /i /r /c:"word[1-3].*word[1-3].*word[1-3]" *.abc

If repeated word1/word2/word3 is not there, or you do want those occurrences in your result, then can use it.