How to improve the performance of Write-Progress?

2020-06-03 08:29发布

问题:

I'm writing a script that takes an output file from another platform (that sadly doesn't produce CSV output, instead it's around 7 lines per record), grabbing the lines that have the values I'm interested in (using select-string) and then scanning the MatchInfo array, extracting the exact text and building an array as I go, to export to CSV when finished.

My problem is that the original file has around 94000 lines of text, and the matchinfo object still has around 23500 records in it, so it takes a while, especially building the array, so I thought I'd throw in a Write-Progress but the overhead in doing so is quite horrific, it increases the elapsed time x8 vs not having the progress bar.

Here's an example entry from the original file:

CREATE   TRANCODE   MPF               OF TXOLID
AGENDA                   = T4XCLCSHINAG
,ANY_SC_LIST              = NONE    ,EVERY_SC_LIST            = NONE
,SECURITY_CATEGORY        = NONE    ,FUNCTION                 = 14
,TRANCODE_VALUE           = "MPF"
,TRANCODE_FUNCTION_MNEMONIC = NONE
,INSTALLATION_DATA        = NONE
;

Now, for each of these, I only care about the values of AGENDA and TRANCODE_VALUE, so having read the file in using Get-Content, I then use Select-String as the most efficient way I know to filter out the rest of the lines in the file:

rv Start,Filtered,count,CSV
Write-Host "Reading Mainframe Extract File"
$Start = gc K:\TRANCODES.txt
Write-Host ("Read Complete : " + $Start.Count + " records found")

Write-Host "Filtering records for AGENDA/TRANCODE information"
$Filtered = $Start|Select-String -Pattern "AGENDA","TRANCODE_VALUE"
Write-Host ([String]($Filtered.Count/2) + " AGENDA/TRANCODE pairs found")

This leaves me with an object of type Microsoft.PowerShell.Commands.MatchInfo with contents like:

AGENDA                   = T4XCLCSHINAG
,TRANCODE_VALUE           = "MPF"
AGENDA                   = T4XCLCSHINAG
,TRANCODE_VALUE           = "MP"

Now that Select-String only took around 9 seconds, so no real need for a progress bar there.

However, the next step, grabbing the actual values (after the =) and putting in an array takes over 30 seconds, so I figured a Write-Progress is helpful to the user and at least shows that something is actually happening, but, the addition of the progress bar seriously extends the elapsed time, see the following output from Measure-Command:

Measure-Command{$Filtered|foreach {If ($_.ToString() -Match 'AGENDA'){$obj = $null;  
                                    $obj = New-Object System.Object;  
                                    $obj | Add-Member -type NoteProperty -name AGENDA -Value $_.ToString().SubString(27)}  
            If ($_.ToString() -Match 'TRANCODE_VALUE'){$obj | Add-Member -type NoteProperty -name TRANCODE -Value ($_.ToString().SubString(28)).Replace('"','');
                                            $CSV += $obj;
                                            $obj = $null}
                           <#$count++     
                           Write-Progress `
                           -Activity "Building table of values from filter results" `
                           -Status ("Processed " + $count + " of " + $Filtered.Count + " records") `
                           -Id 1 `
                           -PercentComplete ([int]($count/$Filtered.Count *100))#>
                        }}



TotalSeconds      : 32.7902523   

So that's 717.2308630680085 records/sec

Measure-Command{$Filtered|foreach {If ($_.ToString() -Match 'AGENDA'){$obj = $null;  
                                    $obj = New-Object System.Object;  
                                    $obj | Add-Member -type NoteProperty -name AGENDA -Value $_.ToString().SubString(27)}  
            If ($_.ToString() -Match 'TRANCODE_VALUE'){$obj | Add-Member -type NoteProperty -name TRANCODE -Value ($_.ToString().SubString(28)).Replace('"','');
                                            $CSV += $obj;
                                            $obj = $null}
                           $count++     
                           Write-Progress `
                           -Activity "Building table of values from filter results" `
                           -Status ("Processed " + $count + " of " + $Filtered.Count + " records") `
                           -Id 1 `
                           -PercentComplete ([int]($count/$Filtered.Count *100))
                        }}


TotalSeconds      : 261.3469632  

Now only a paltry 89.98660799693897 records/sec

Any ideas how to improve the efficiency?

Here's the full script as-is:

rv Start,Filtered,count,CSV  
Write-Host "Reading Mainframe Extract File"  
$Start = gc K:\TRANCODES.txt  
Write-Host ("Read Complete : " + $Start.Count + " records found")  

Write-Host "Filtering records for AGENDA/TRANCODE information"  
$Filtered = $Start|Select-String -Pattern "AGENDA","TRANCODE_VALUE"  
Write-Host ([String]($Filtered.Count/2) + " AGENDA/TRANCODE pairs found")  

Write-Host "Building table from the filter results"  
[int]$count = 0  
$CSV = @()  
$Filtered|foreach {If ($_.ToString() -Match 'AGENDA'){$obj = $null;  
                                    $obj = New-Object System.Object;  
                                    $obj | Add-Member -type NoteProperty -name AGENDA -Value $_.ToString().SubString(27)}  
            If ($_.ToString() -Match 'TRANCODE_VALUE'){$obj | Add-Member -type NoteProperty -name TRANCODE -Value ($_.ToString().SubString(28)).Replace('"','');  
                                            $CSV += $obj;  
                                            $obj = $null}  
                           $count++     
                           Write-Progress `  
                           -Activity "Building table of values from filter results" `  
                           -Status ("Processed " + $count + " of " + $Filtered.Count + " records") `  
                           -Id 1 `  
                           -PercentComplete ([int]($count/$Filtered.Count *100))  
                        }  
     Write-Progress `  
     -Activity "Building table of values from filter results" `  
     -Status ("Table built : " + $CSV.Count + " rows created") `  
     -Id 1 `  
     -Completed  

Write-Host ("Table built : " + $CSV.Count + " rows created")  

Write-Host "Sorting and Exporting table to CSV file"  

$CSV|Select TRANCODE,AGENDA|Sort TRANCODE |Export-CSV -notype K:\TRANCODES.CSV  

Here's output from script with the write-progress commented out:

Reading Mainframe Extract File
Read Complete : 94082 records found
Filtering records for AGENDA/TRANCODE information
11759 AGENDA/TRANCODE pairs found
Building table from the filter results
Table built : 11759 rows created
Sorting and Exporting table to CSV file

TotalSeconds      : 75.2279182  

EDIT: I've adopted a modified version of the answer from @RomanKuzmin, so the appropriate code section now looks like:

Write-Host "Building table from the filter results"
[int]$count = 0
$CSV = @()
$sw = [System.Diagnostics.Stopwatch]::StartNew()
$Filtered|foreach {If ($_.ToString() -Match 'AGENDA'){$obj = $null;
                                        $obj = New-Object System.Object;
                                        $obj | Add-Member -type NoteProperty -name AGENDA -Value $_.ToString().SubString(27)}
                If ($_.ToString() -Match 'TRANCODE_VALUE'){$obj | Add-Member -type NoteProperty -name TRANCODE -Value ($_.ToString().SubString(28)).Replace('"','');
                                                $CSV += $obj;
                                                $obj = $null}
                               $count++     
                               If ($sw.Elapsed.TotalMilliseconds -ge 500) {
                               Write-Progress `
                               -Activity "Building table of values from filter results" `
                               -Status ("Processed " + $count + " of " + $Filtered.Count + " records") `
                               -Id 1 `
                               -PercentComplete ([int]($count/$Filtered.Count *100));
                               $sw.Reset();
                               $sw.Start()}
                            }
         Write-Progress `
         -Activity "Building table of values from filter results" `
         -Status ("Table built : " + $CSV.Count + " rows created") `
         -Id 1 `
         -Completed

And running the entire script through Measure-Command gives elapsed time of 75.2279182 seconds with no write-progress and with the modified write-progress using @RomanKuzmin suggestion, 76.525382 seconds - not bad at all!! :-)

回答1:

In such cases when progress is called too often I use this approach

# fast even with Write-Progress
$sw = [System.Diagnostics.Stopwatch]::StartNew()
for($e = 0; $e -lt 1mb; ++$e) {
    if ($sw.Elapsed.TotalMilliseconds -ge 500) {
        Write-Progress -Activity Test -Status "Done $e"
        $sw.Reset(); $sw.Start()
    }
}

# very slow due to Write-Progress
for($e = 0; $e -lt 1mb; ++$e) {
    Write-Progress -Activity Test -Status "Done $e"
}

Here is the suggestion on Connect....



回答2:

I hope this helps someone else. I spent a day on a similar problem: Progress bar was very very slow.

My problem however was rooted in the fact that I had made the screenbuffer for the powershell console extremely wide (9999 instead of the default 120).

This caused Write-Progress to be slowed to the extreme every time it had to update the gui progress bar.



回答3:

For loops, all I do is something simple as follows

if it's a loop in a loop, you can write progress in the parent loop to save time. Here is a snippet of a time saving write-progress single loop (works on parent loops too)

If (($progressCheck % 500) -eq 0){
write-progress -activity "$totalDone out of $finalCount completed" -PercentComplete $(($totalDone / $finalCount) * 100)
}
$progressCheck = $progressCheck + 1

This will run it every 500 iterations of the loop. And only adds the overhead of adding 1 to progress check each loop and checking the if statement which are both extremely minimal.

Just change the iterations to what you think is reasonable. Or you can get a count divided by 100 instead of 500 Lets say it's lines in an array...

If (($progressCheck % (($array).count / 100)) -eq 0){

Lets say it's lines in a file mind you this cats the file each time and wastes time on large files. Better to get the count variable before the loop and use the variable each check

If (($progressCheck % ((gc filename.txt).count / 100)) -eq 0){

both of these are ways to make it only check every 1% of the completion time. You could change 100 to 500, and it would check every 1/500th of the time. The greater the number the slower it is, but if your doing 10 million checks, 1/500th is very minimal. 1 in 20,000 vs every single loop. it will only run it 500 times instead of 10 million times.

In your script above, you could just replace $progressCheck with your $count variable that you're already using.

Please note* that this won't pop up until the first round of checks is done (500) so to fix that you could do an or statement to get it to check the first loop iteration...

If ((($progressCheck % 500) -eq 0) -or ($progressCheck -eq 0)){
write-progress -activity "$totalDone out of $finalCount completed" -PercentComplete $(($totalDone / $finalCount) * 100)
}
$progressCheck = $progressCheck + 1

Make sure your progressCheck variable is set to 0 at the start of the script

to ensure powershell doesn't do some funky rounding on you, you can math floor it to get a whole integer...

If (($([math]::floor($progressCheck % 500)) -eq 0){

or just set -eq to -lt 0.001



回答4:

I wanted to use write-progress to monitor the piping of get-child-item to file. The solution was to start a new job and then monitor the output of the job for change from another process. Powershell makes this quite easy.

 # start the job to write the file index to the cache
 $job = start-job {
     param($path)
     Get-ChildItem -Name -Attributes !D -Recurse $path > $path/.hscache
 } -arg $(pwd) 

 # Wake every 200 ms and print the progress to the screen until the job is finished
 while( $job.State -ne "Completed") {
        Write-Progress -Activity ".hscache-build " -Status $(get-childitem .hscache).length
        sleep -m 200
 }

# clear the progress bar
Write-Progress -Activity ".hscache-build" -Completed