I need to split a large file upload into many parallel processes and want to use a single CSV file as input.
Is it possible to access blocks of rows from an Import-Csv
object, something like this:
$SODAData = Import-Csv $CSVPath -Delimiter "|" |
Where $_.Rownum == 20,000..29,999 |
Foreach-Object { ... }
What is the syntax for such an extraction?
I'm using Powershell 5.
Import-Csv
imports the file as an array of objects, so you could do something like this (using the range operator):
$csv = Import-CSv $CSVPath -Delimiter '|'
$SOAData = $csv[20000..29999] | ForEach-Object { ... }
An alternative would be to use Select-Object
:
$offset = 20000
$count = 10000
$csv = Import-Csv $CSVPath -Delimiter '|'
$SODAData = $csv |
Select-Object -Skip $offset -First $count |
ForEach-Object { ... }
If you want to avoid reading the entire file into memory you can change the above to a single pipeline:
$offset = 20000
$count = 10000
$SODAData = Import-Csv $CSVPath -Delimiter '|' |
Select-Object -Skip $offset -First $count |
ForEach-Object { ... }
Beware, though, that with this approach you need to read the file multiple times for processing multiple chunks of data.