I need to split a large file upload into many parallel processes and want to use a single CSV file as input.
Is it possible to access blocks of rows from an Import-Csv
object, something like this:
$SODAData = Import-Csv $CSVPath -Delimiter "|" |
Where $_.Rownum == 20,000..29,999 |
Foreach-Object { ... }
What is the syntax for such an extraction?
I'm using Powershell 5.
imports the file as an array of objects, so you could do something like this (using the range operator):
$csv = Import-CSv $CSVPath -Delimiter '|'
$SOAData = $csv[20000..29999] | ForEach-Object { ... }
An alternative would be to use Select-Object
$offset = 20000
$count = 10000
$csv = Import-Csv $CSVPath -Delimiter '|'
$SODAData = $csv |
Select-Object -Skip $offset -First $count |
ForEach-Object { ... }
If you want to avoid reading the entire file into memory you can change the above to a single pipeline:
$offset = 20000
$count = 10000
$SODAData = Import-Csv $CSVPath -Delimiter '|' |
Select-Object -Skip $offset -First $count |
ForEach-Object { ... }
Beware, though, that with this approach you need to read the file multiple times for processing multiple chunks of data.