The following findstr.exe
command almost does what I want, but not quite:
findstr /s /i /c:"word1 word2 word3" *.abc
I have used:
/s
for searching all subfolders.
/c:
Uses specified text as a literal search string
/i
Specifies that the search is not to be case-sensitive.
*.abc
Files of type abc.
The above looks for word1 word2 word3
as a literal, and therefore only finds the words in that exact order.
By contrast, I want all words to match individually, in any order (AND logic, conjunction).
If I remove /c:
from the command above, then lines matching any of the words are returned (OR logic, disjunction), which is not what I want.
Can this be done in PowerShell?
You can use Select-String
to do a regex based search through multiple files.
To match all of multiple search terms in a single string with regular expressions, you'll have to use a lookaround assertion:
Get-ChildItem -Filter *.abc -Recurse |Select-String -Pattern '^(?=.*\bword1\b)(?=.*\bword2\b)(?=.*\bword3\b).*$'
In the above example, this is what's happening with the first command:
Get-ChildItem -Filter *.abc -Recurse
Get-ChildItem
searches for files in the current directory
-Filter *.abc
shows us only files ending in *.abc
-Recurse
searches all subfolders
We then pipe the resulting FileInfo objects to Select-String
and use the following regex pattern:
^(?=.*\bword1\b)(?=.*\bword2\b)(?=.*\bword3\b).*$
^ # start of string
(?= # open positive lookahead assertion containing
.* # any number of any characters (like * in wildcard matching)
\b # word boundary
word1 # the literal string "word1"
\b # word boundary
) # close positive lookahead assertion
... # repeat for remaining words
.* # any number of any characters
$ # end of string
Since each lookahead group is just being asserted for correctness and the search position within the string never changes, the order doesn't matter.
If you want it to match strings that contain any of the words, you can use a simple non-capturing group:
Get-ChildItem -Filter *.abc -Recurse |Select-String -Pattern '\b(?:word1|word2|word3)\b'
\b(?:word1|word2|word3)\b
\b # start of string
(?: # open non-capturing group
word1 # the literal string "word1"
| # or
word2 # the literal string "word2"
| # or
word3 # the literal string "word3"
) # close positive lookahead assertion
\b # end of string
These can of course be abstracted away in a simple proxy function.
I generated the param
block and most of the body of the Select-Match
function definition below with:
$slsmeta = [System.Management.Automation.CommandMetadata]::new((Get-Command Select-String))
[System.Management.Automation.ProxyCommand]::Create($slsmeta)
Then removed unnecessary parameters (including -AllMatches
and -Pattern
), then added the pattern generator (see inline comments):
function Select-Match
{
[CmdletBinding(DefaultParameterSetName='Any', HelpUri='http://go.microsoft.com/fwlink/?LinkID=113388')]
param(
[Parameter(Mandatory=$true, Position=0)]
[string[]]
${Substring},
[Parameter(Mandatory=$true, ValueFromPipelineByPropertyName=$true)]
[Alias('PSPath')]
[string[]]
${LiteralPath},
[Parameter(ParameterSetName='Any')]
[switch]
${Any},
[Parameter(ParameterSetName='Any')]
[switch]
${All},
[switch]
${CaseSensitive},
[switch]
${NotMatch},
[ValidateNotNullOrEmpty()]
[ValidateSet('unicode','utf7','utf8','utf32','ascii','bigendianunicode','default','oem')]
[string]
${Encoding},
[ValidateNotNullOrEmpty()]
[ValidateCount(1, 2)]
[ValidateRange(0, 2147483647)]
[int[]]
${Context}
)
begin
{
try {
$outBuffer = $null
if ($PSBoundParameters.TryGetValue('OutBuffer', [ref]$outBuffer))
{
$PSBoundParameters['OutBuffer'] = 1
}
# Escape literal input strings
$EscapedStrings = foreach($term in $PSBoundParameters['Substring']){
[regex]::Escape($term)
}
# Construct pattern based on whether -Any or -All was specified
if($PSCmdlet.ParameterSetName -eq 'Any'){
$Pattern = '\b(?:{0})\b' -f ($EscapedStrings -join '|')
} else {
$Clauses = foreach($EscapedString in $EscapedStrings){
'(?=.*\b{0}\b)' -f $_
}
$Pattern = '^{0}.*$' -f ($Clauses -join '')
}
# Remove the Substring parameter argument from PSBoundParameters
$PSBoundParameters.Remove('Substring') |Out-Null
# Add the Pattern parameter argument
$PSBoundParameters['Pattern'] = $Pattern
$wrappedCmd = $ExecutionContext.InvokeCommand.GetCommand('Microsoft.PowerShell.Utility\Select-String', [System.Management.Automation.CommandTypes]::Cmdlet)
$scriptCmd = {& $wrappedCmd @PSBoundParameters }
$steppablePipeline = $scriptCmd.GetSteppablePipeline($myInvocation.CommandOrigin)
$steppablePipeline.Begin($PSCmdlet)
} catch {
throw
}
}
process
{
try {
$steppablePipeline.Process($_)
} catch {
throw
}
}
end
{
try {
$steppablePipeline.End()
} catch {
throw
}
}
<#
.ForwardHelpTargetName Microsoft.PowerShell.Utility\Select-String
.ForwardHelpCategory Cmdlet
#>
}
Now you can use it like this, and it'll behave almost like Select-String
:
Get-ChildItem -Filter *.abc -Recurse |Select-Match word1,word2,word3 -All
Another (admittedly less sophisticated) approach would be to simply daisy-chain filters, since the order of the words doesn't matter. Filter your files for one word first, then filter the output for lines that also contain the second word, then filter that output for lines that also containt the third word.
findstr /s /i "word1" *.abc | findstr /i "word2" | findstr /i "word3"
Using PowerShell cmdlets the above would look like this:
Get-ChildItem -Filter '*.abc' -Recurse | Get-Content | Where-Object {
$_ -like '*word1*' -and
$_ -like '*word2*' -and
$_ -like '*word3*'
}
or (using aliases):
ls '*.abc' -r | cat | ? {
$_ -like '*word1*' -and
$_ -like '*word2*' -and
$_ -like '*word3*'
}
Note that aliases are just to save time typing on the commandline, so I do not recommend using them in scripts.
Note:
The first part of this answer does not solve the OP's problem - for solutions, see Mathias R. Jessen's helpful answer and Ansgar Wiecher's helpful answer; alternatively, see the bottom of this answer, which offers a generic solution adapted from Mathias' code.
(Due to an initial misreading of the question), this part of the answer uses disjunctive logic - matching lines that have at least one matching search term - which is the only logic that findstr.exe
and PowerShell's Select-String
(directly) support.
By contrast, the OP is asking for conjunctive logic, which requires additional work.
This part of the answer may still be of interest with respect to translating findstr.exe
commands to PowerShell, using Select-String
.
The PowerShell equivalent of the findstr
command from the question, but without /c:
-
FINDSTR /s /i "word1 word2 word3" *.abc
- is:
(Get-ChildItem -File -Filter *.abc -Recurse |
Select-String -SimpleMatch -Pattern 'word1', 'word2', 'word3').Count
/s
-> Get-ChildItem -File -Filter *.abc -Recurse
outputs all files in the current directory subtree matching *.abc
- Note that wile
Select-String
is capable of accepting a filename pattern (wildcard expression) such as *.abc
, it doesn't support recursion, so the separate Get-ChildItem
call is needed, whose output is piped to Select-String
.
findstr
-> Select-String
, PowerShell's more flexible counterpart:
-SimpleMatch
specifies that the -Pattern
argument(s) be interpreted as literals rather than as regexes (regular expressions). Note how they defaults differ:
findstr
expects literals by default (you can switch to regexes with /R
).
Select-String
expects regexes by default (you can switch to literal with -SimpleMatch
).
-i
-> (default behavior); like most of PowerShell, case-insensitivity is Select-String
's default behavior - add -CaseSensitive
to change that.
"word1 word2 word3"
-> -Pattern 'word1', 'word2', 'word3'
; specifying an array of patterns looks for a match for at least one of the patterns on each line (disjunctive logic).
- That is, all of the following lines would match:
... word1 ...
, ... word2 ...
, ... word2 word1 ...
, ... word3 word1 word2 ...
/c
-> (...).Count
: Select-String
outputs a collection of objects representing the matching lines, which this expression simply counts.
The objects output are [Microsoft.PowerShell.Commands.MatchInfo]
instances, which not only include the matching line, but metadata about the input and the specifics of what matched.
A solution, building on Mathias R. Jessen's elegant wrapper function:
Select-AllStrings
is a conjunctive-only companion function to the disjunctive-only Select-String
cmdlet that uses the exact same syntax as the latter, with the exception of not supporting the -AllMatches
switch.
That is, Select-AllStrings
requires that all patterns passed to it - whether they're regexes (by default) or literals (with -SimpleMatch
) - match the line.
Applied to the OP's problem, we get:
(Get-ChildItem -File -Filter *.abc -Recurse |
Select-AllStrings -SimpleMatch word1, word2, word3).Count
Note the variations compared to the command at the top:
* The -Pattern
parameter is implicitly bound, by argument position.
* The patterns are specified as barewords (unquoted) for convenience, though it's generally safer to quote, because it's not easy to remember what needs quoting.
The following will work if you DO NOT HAVE ANY OF THE WORDS REPEATED IN THE SAME LINE as:
word1 hello word1 bye word1
findstr /i /r /c:"word[1-3].*word[1-3].*word[1-3]" *.abc
If repeated word1/word2/word3 is not there, or you do want those occurrences in your result, then can use it.