Match everything between two words in Powershell

2019-02-11 07:47发布

问题:

I have a big text file with SQL query in between blocks of EXEC SQL --- END-EXEC.

I need everything in-between EXEC SQL --- END-EXEC. keywords. Sample Input is below.

This is how I'm trying to do. I'm able to get the contents if there is one block of EXEC SQL --- END-EXEC. However, If there are multiple EXEC SQL --- END-EXEC. blocks, I am failing.

a) Read the file as a string.

For reading whole file as string, I'm using

$content = [io.file]::ReadAllText("$pathtofile")
$content = $content -replace "\s*`n", " " 

 (p.s. I'm using V2.0 so cannot use -raw option)

b) Then I doing this to match everything in between EXEC keyword.

$result = $content -match "EXEC(.*)EXEC"
$matches[0] > D:\result.txt

Input :

   * ABCD ABCD ABCD BLAH BLAH BLAH - This is some text preceding EXEC SQL

    **EXEC SQL** 
    DECLARE TMPMOTT-CUR CURSOR FOR 
          SELECT KONTYP                     
                ,BFDAT                       
                ,MARK 
                ,MOTT 
                ,AVS     
                ,BEL                      
                ,OKLBE                  
          FROM  ABCBEFGH                      
          ORDER BY MOTT                  

    **END-EXEC**.                               

    * ABCD ABCD ABCD BLAH BLAH BLAH  - This is some text after END-EXEC. 

回答1:

$script = Get-Content D:\temp\script.sql

$in = $false

$script | %{
    if ($_.Contains("EXEC SQL"))
        { $in = $true }
    elseif ($_.Contains("END-EXEC"))
        { $in = $false; }
    elseif ($in)
        { Write-Host $_ } # Or Out-File ...
}


回答2:

You need to use a lazy quantifier to make sure that your regex matches each EXEC block individually. And you can gather all matches with a single regex operation:

$regex = [regex] '(?is)(?<=\bEXEC SQL\b).*?(?=\bEND-EXEC\b)'
$allmatches = $regex.Matches($subject);

Explanation:

(?is)         # Case-insensitive matching, dot matches newline
(?<=          # Assert that this matches before the current position:
 \b           # Start of a word
 EXEC\ SQL    # Literal text "EXEC SQL"
 \b           # End of word
)             # End of lookbehind assertion
.*?           # Match any number of characters, as few as possible
(?=           # until it's possible to match the following at the current position:
 \bEND-EXEC\b # the words "END-EXEC"
)             # End of lookahead assertion


回答3:

Or:

(?<=\*\*EXEC SQL\*\*)[\s\S]*(?=\*\*END-EXEC\*\*)

With multiline mode