This article shows how to use Invoke-Async in PowerShell: https://sqljana.wordpress.com/2018/03/16/powershell-sql-server-run-in-parallel-collect-sql-results-with-print-output-from-across-your-sql-farm-fast/
I wish to run in parallel the copy-item cmdlet in PowerShell because the alternative is to use FileSystemObject via Excel and copy one file at a time out of a total of millions of files.
I have cobbled together the following:
.SYNOPSIS
<Brief description>
For examples type:
Get-Help .\<filename>.ps1 -examples
.DESCRIPTION
Copys files from one path to another
.PARAMETER FileList
e.g. C:\path\to\list\of\files\to\copy.txt
.PARAMETER NumCopyThreads
default is 8 (but can be 100 if you want to stress the machine to maximum!)
.EXAMPLE
.\CopyFilesToBackup -filelist C:\path\to\list\of\files\to\copy.txt
.NOTES
#>
[CmdletBinding()]
Param(
[String] $FileList = "C:\temp\copytest.csv",
[int] $NumCopyThreads = 8
)
$filesToCopy = New-Object "System.Collections.Generic.List[fileToCopy]"
$csv = Import-Csv $FileList
foreach($item in $csv)
{
$file = New-Object fileToCopy
$file.SrcFileName = $item.SrcFileName
$file.DestFileName = $item.DestFileName
$filesToCopy.add($file)
}
$sb = [scriptblock] {
param($file)
Copy-item -Path $file.SrcFileName -Destination $file.DestFileName
}
$results = Invoke-Async -Set $filesToCopy -SetParam file -ScriptBlock $sb -Verbose -Measure:$true -ThreadCount 8
$results | Format-Table
Class fileToCopy {
[String]$SrcFileName = ""
[String]$DestFileName = ""
}
the csv input for which looks like this:
SrcFileName,DestFileName
C:\Temp\dummy-data\101438\101438-0154723869.zip,\\backupserver\Project Archives\101438\0154723869.zip
C:\Temp\dummy-data\101438\101438-0165498273.xlsx,\\backupserver\Project Archives\101438\0165498273.xlsx
What am I missing to get this working, because when I run .\CopyFiles.ps1 -FileList C:\Temp\test.csv nothing happens. The files exist in the source path, but the file objects aren't being pulled from the -Set collection. (Unless I have misunderstood how the collection is used?)
No, I can't use robocopy to do this because there are millions of files which resolve to different paths depending upon their original location.
I have no explanation for your symptom based on the code in your question (see bottom section), but I suggest basing your solution on the (now) standard
Start-ThreadJob
cmdlet (comes with PowerShell Core; in Windows PowerShell, install it withInstall-Module ThreadJob -Scope CurrentUser
, for instance):Such a solution is more efficient than use of the third-party
Invoke-Async
function, which as of this writing is flawed in that it waits for jobs to finish in a tight loop, which creates unnecessary processing overhead.Start-ThreadJob
jobs are a lightweight, thread-based alternative to the process-basedStart-Job
background jobs, yet they integrate with the standard job-management cmdlets, such asWait-Job
andReceive-Job
.Here's a self-contained example based on your code that demonstrates its use:
Note: Whether you use
Start-ThreadJob
orInvoke-Async
, you won't be able to explicit reference custom classes such as[fileToCopy]
in the script block that runs in separate threads (runspaces; see bottom section), so the solution below simply uses[pscustomobject]
instances with the properties of interest for simplicity and brevity.The above yields something like:
Note that the output received does not reflect the input order, and that the overall runtime is roughly 2 times the per-thread runtime of 2 seconds (plus overhead), because 2 "batches" have to be run due to the input count being 10, whereas only 8 threads were made available.
If you upped the thread count to 10 or more (50 is the default), the overall runtime would drop to 2 seconds plus overhead, because all jobs then run concurrently.
Caveat: The above numbers stem from running in PowerShell Core, version on Microsoft Windows 10 Pro (64-bit; Version 1903), using version 2.0.1 of the
ThreadJob
module.Inexplicably, the same code is much slower in Windows PowerShell, v5.1.18362.145.
However, for performance and memory consumption it is better to use batching (chunking) in your case, i.e, to process multiple file pairs per thread.
The following solution demonstrates this approach; tweak
$chunkSize
to find a batch size that works for you.While the output is effectively the same, note how only 4 jobs were created this time, each of which processed (up to)
$chunkSize
(3
) file pairs.As for what you tried:
The screen shot you show suggests that the problem is that your custom class,
[fileToCopy]
, isn't visible to the script block run byInvoke-Async
.Since
Invoke-Async
invokes the script block via the PowerShell SDK in separate runspaces that know nothing about the caller's state, it is to be expected that these runspaces don't know your class (this equally applies toStart-ThreadJob
).However, it is unclear why that is a problem in your code, because your script block doesn't make an explicit reference to you class: your script-block parameter
$file
is not type-constrained (it is implicitly[object]
-typed).Therefore, simply accessing the properties of your custom-class instance inside the script block should work, and indeed does in my tests on Windows PowerShell v5.1.18362.145 on Microsoft Windows 10 Pro (64-bit; Version 1903).
However, if your real script-block code were to explicitly reference custom class
[fileToCopy]
- such as by defining the parameter asparam([fileToToCopy] $file)
- you would see the symptom.