Background Info:
I have an application that makes several SQL connections to multiple databases which currently takes a very very long time to execute.
Powershell (.NET) will wait for each proceeding "SQL-GET" function to finish before it can fire off the next. I am under the impression I can speed this app up dramatically by firing each "SQL-GET" function in their own background job simultaneously!I will then retrieve the data from each job as they finish. Ideally as a DataSet system object.
The Issues:
When retrieving the data from the background job, I can ONLY manage to get a System.Array object back. What I am actually after, is a System.DataSet object. This is necessary because all the logic within the app is dependant on a DataSet object.
The Code:
Here is a v.simple slice of code that will create a sql connection and fill a newly created dataset object with the results returned. Works a treat. The $results is a DataSet object and I can manipulate this nicely.
$query = "SELECT * FROM [database]..[table] WHERE column = '123456'"
$Connection = New-Object System.Data.SqlClient.SQLConnection
$ConnectionString = "Server='SERVER';Database='DATABASE';User ID='SQL_USER';Password='SQL_PASSWORD'"
$Connection.ConnectionString = $ConnectionString
$Connection.Open()
$Command = New-Object system.Data.SqlClient.SqlCommand($Query,$Connection)
$Adapter = New-Object system.Data.SqlClient.SqlDataAdapter
$Adapter.SelectCommand = $Command
$Connection.Close()
[System.Data.SqlClient.SqlConnection]::ClearAllPools()
$results = New-Object system.Data.DataSet
[void]$Adapter.fill($results)
$results.Tables[0]
And here is that VERY SAME CODE wrapped into the scriptblock parameter of a new background job. Only upon calling Receive-Job, I get an array back, not a dataset.
$test_job = Start-Job -ScriptBlock {
$query = "SELECT * FROM [database]..[table] WHERE column = '123456'"
$Connection = New-Object System.Data.SqlClient.SQLConnection
$ConnectionString = "Server='SERVER';Database='DATABASE';User ID='SQL_USER';Password='SQL_PASSWORD'"
$Connection.ConnectionString = $ConnectionString
$Connection.Open()
$Command = New-Object system.Data.SqlClient.SqlCommand($Query,$Connection)
$Adapter = New-Object system.Data.SqlClient.SqlDataAdapter
$Adapter.SelectCommand = $Command
$Connection.Close()
[System.Data.SqlClient.SqlConnection]::ClearAllPools()
$results = New-Object system.Data.DataSet
[void]$Adapter.fill($results)
return $results.Tables[0]
}
Wait-Job $test_job
$ret_results = Receive-Job $test_job
Any help would be greatly appreciated!!!
Research Thus Far:
I have done the old Google, but all of the posts, blogs and articles I stumble across seem to go into EXTREME depth about managing jobs and all the bells and whistles around this. Is it the underlying nature of powershell to ONLY return an array through the receive-job cmdlet?
I have read a stack post about the return expression. Thought I was on to something. Attempted:
return $results.Tables[0]
return ,$results.Tables[0]
return ,$results
All still return an array.
I have seen people, rather cumbersomely, manually transform the array back into a dataset object - though this seems very 'dirty' - I am pedantic and live in hope there must be a way for this magical dataset object to traverse through the background job and into my current session! :)
To reiterate:
Basically, all I would like is to have the $ret_results object retrieved from the Receive-Job cmdlet to be a DataSet...or even a DataTable. I'll take either...JUST NOT AN ARRAY :)
When you run a script as a PS Job, you are creating a new process (pid), and can't really get the same object from the parent job. What you receive with Receive-Job cmdlet is a deserialized copy of that object (all properties will be converted to base types (like string/number/etc) and methods will be removed.
But there is a solution - runspaces. Runspace is a child process created within same pid as a separate thread. Pretty much it's asynchronous function (script block) execution. Check the sample below:
This code executes $script scriptblock within a runspace. This sample is not running it asyncronously (need to use BeginInvoke/EndInvoke, just skipped that for simplicity), but as you can see it's returning actual DataSet/DataTable, not PSObject
To learn more check these posts from Scripting Guy: https://blogs.technet.microsoft.com/heyscriptingguy/2015/11/26/beginning-use-of-powershell-runspaces-part-1/ He also created a PoshRSJob module - it mirrors standard job cmdlets but uses Runspaces instead (with async execution)
In powershell, it is common for a set of more than one objects of an arbitrary type to return in a collection. Consider this altered example where I build my own table:
Output received. So what did we get?
An array, as you've described. But that's the whole object. What if we analyze its members individually, by piping them to
Get-Member
?Consider the following:
In your job, you have specified that
$results.Tables[0]
should be returned. By specifying a particular Tables iterate, you're returning the object that describes that table... perhaps a DataTable, or in this case DataRows... instead of a DataSet like you seem to be expecting?DataTables have rows. If the DataTable has more than one row, powershell will return it in a collection of DataRows, as I've demonstrated above. You may be surprised to learn that this is not the case for a single row returning -- it will only return the single DataRow object instead of a collection of DataRow objects.
If this really is the output you are expecting, you may want to force it to always return in a collection by specifying the output as
@($results.Tables[0])
. That way, you always know to expect a collection and can handle the resulting content appropriately (by iterating through the collection to manage individual objects).