Extract multiple lines of text between two key wor

2020-02-15 04:07发布

I have a shell command I'd like to extract data from using Powershell. The data I need will always sit between two key words and the number of lines captured can change.

The output can look something like this.

Sites:
System1: 
RPAs: OK
Volumes: 
  WARNING: Storage group DR_UCS_01-08 contains both replicated and unreplicated volumes. ; CS_TX
  WARNING: Storage group DR_UCS_21-28 contains both replicated and unreplicated volumes. ; CS_TX
  WARNING: Storage group DR_UCS_31-38 contains both replicated and unreplicated volumes. ; CS_TX
Splitters: OK
System2: 
RPAs: OK
Volumes: 
  WARNING: Storage group MA_UCS_1 contains both replicated and unreplicated volumes. ; CS_MA
  WARNING: Storage group MA_UCS_2 contains both replicated and unreplicated volumes. ; CS_MA
  WARNING: Storage group MA_UCS_3 contains both replicated and unreplicated volumes. ; CS_MA
Splitters: OK
WAN: OK
System: OK

I would like to capture and store into a variable (or text file if easier?) part of this data to be reused later in the script. For example, I would like to capture everything between System1: and System2: which would produce:

RPAs: OK
Volumes: 
  WARNING: Storage group DR_UCS_01-08 contains both replicated and unreplicated volumes. ; CS_MA
  WARNING: Storage group DR_UCS_21-28 contains both replicated and unreplicated volumes. ; CS_MA
  WARNING: Storage group DR_UCS_31-38 contains both replicated and unreplicated volumes. ; CS_MA
Splitters: OK

I've been messing with different regex combinations with no success. I've had some moderate success with this code but it doesn't seem to be able to handle the warning lines and I also can't seem to get Out-File to work with it either, only Write-Host which does not help me much.

$RP = plink -l User -pw Password 192.168.1.100 "get_system_status summary=no" #extract from

$script = $RP

$in = $false

$script | %{
if ($_.Contains("System1"))
    { $in = $true }
elseif ($_.Contains("System2"))
    { $in = $false; }
elseif ($in)
    { Write-Host $_ }
}

Ideally I'd like to be able to take this script and use it to parse data from any shell command. I'm currently lost and almost ready to give up on this.

2条回答
forever°为你锁心
2楼-- · 2020-02-15 05:01

One option is to join the text with newlines, then use -split with a multi-line regex:

$text = 
(@'
Sites:
System1: 
RPAs: OK
Volumes: 
  WARNING: Storage group DR_UCS_01-08 contains both replicated and unreplicated volumes. ; CS_TX
  WARNING: Storage group DR_UCS_21-28 contains both replicated and unreplicated volumes. ; CS_TX
  WARNING: Storage group DR_UCS_31-38 contains both replicated and unreplicated volumes. ; CS_TX
Splitters: OK
System2: 
RPAs: OK
Volumes: 
  WARNING: Storage group MA_UCS_1 contains both replicated and unreplicated volumes. ; CS_MA
  WARNING: Storage group MA_UCS_2 contains both replicated and unreplicated volumes. ; CS_MA
  WARNING: Storage group MA_UCS_3 contains both replicated and unreplicated volumes. ; CS_MA
Splitters: OK
WAN: OK
System: OK
'@).split("`n") |
foreach {$_.trim()} 

$text -join "`n" -split '(?ms)(?=^System\d+:\s*)' -match '^System\d+:'

System1:
RPAs: OK
Volumes:
WARNING: Storage group DR_UCS_01-08 contains both replicated and unreplicated volumes. ; CS_TX
WARNING: Storage group DR_UCS_21-28 contains both replicated and unreplicated volumes. ; CS_TX
WARNING: Storage group DR_UCS_31-38 contains both replicated and unreplicated volumes. ; CS_TX
Splitters: OK

System2:
RPAs: OK
Volumes:
WARNING: Storage group MA_UCS_1 contains both replicated and unreplicated volumes. ; CS_MA
WARNING: Storage group MA_UCS_2 contains both replicated and unreplicated volumes. ; CS_MA
WARNING: Storage group MA_UCS_3 contains both replicated and unreplicated volumes. ; CS_MA
Splitters: OK
WAN: OK
System: OK

Edit: a more generic solution to just capturing the output between two specific keywords:

$regex = '(?ms)System1:(.+?)System2:'

$text = $text -join "`n"

$OutputText = 
[regex]::Matches($text,$regex) |
 foreach {$_.groups[1].value -split }
查看更多
Juvenile、少年°
3楼-- · 2020-02-15 05:06

Try this regex:

$result = ($text | Select-String 'System1:\s*\r\n((.*\r\n)*)\s*System2:' -AllMatches)
$result.Matches[0].Groups[1].Value

Where $text is your original input. Note that you might have to adjust your line endings from \r\n to \n depending on your input. You may also have more than one match, I'm not sure from your sample.

The regex starts matching with System1:\s*\r\n which is System1 followed by any number of spaces, followed by a newline. It ends the match with the literal System2:. The inner middle, .*\r\n, matches all characters followed by a newline. The outer middle (.*\r\n)* says to repeatedly match that pattern. Finally that construct is grouped, ((.*\r\n)*) so that all the matching lines can be extracted as the result.

查看更多
登录 后发表回答