How do I run multiple data flow tasks in parallel

2019-06-26 03:56发布

问题:

I have a package with 3 parallel data flow tasks added in a sequence container. I need to implement transaction such that, the transaction completes if all the 3 data flow tasks successfully execute. If any of the data flow task fails, then transaction should rollback.

I am testing this package but its not working.

  • I took one sequence container and dragged and dropped data flow task 1, data flow task 2 and data flow task 3 and configured all three data flow tasks.

  • In the sequencee container's properties, I have set the following properties

    Transaction: supported

    Isolation: readcommited

  • On the three data flow task's properties, I have set the following properties

    Transaction: supported

    Isolation: readuncommited

  • When I run the package, the data flow task 1 fails but the data into data flow task 2 is still being inserted and committed.

How can I prevent the data flow tasks from committing the transactions even if one of the data flow tasks fail?

回答1:

I agree with the other answers, your problem requires you set the transaction scope on an enclosing container to Required. Unless you have altered the inner objects, their default transaction level is Supported which means they will enlist in an transaction if available. A Required settings will start a transaction and for completeness, NotSupported indicates that the Executable/Container will ignore any existing transactions which may result in deadlocks depending on your design.

I built out a sample package that drops and recreates a target table to verify transactions are behaving as expected. There are 3 data flows in the package, each one adding a unique value to the table (1, 2, 4) so that a query like this will indicate whether values arrived in the destination table.

SELECT count(1) AS rc, sum(T.col1) AS Total FROM dbo.TrxTest T

As you can see, there are 7 variables, 3 in pairs. The FailDataFlow named ones are booleans that allow any of the unique data flows to fail/succeed. This is accomplished by causing a divide by 0 exception in the where clause of the corresponding query.

The Sequence Container has a TransactionOption of Required. The individual data flows retain their default TransactionOption of Supported.

The first execution resulted in an unable to communicate with the distributed transaction coordinator as it was set to manual startup on this VM. Correcting that problem resulted in the package correctly failing out as "DFT value 2" generated a divide by zero exception. Running the query way above showed nothing in my table despite the presence of a green box on "DFT Value 1".

Switching the value of FailDataFlow1 to False and re-running showed values of 3 and 7 respectively in my query which indicates all rows arrived.

You can further explore by changing transaction options on various containers/executables to assure yourself they are working as the other respondents have indicated.



回答2:

You have to set the transaction option =required for the sequence container and =supported for all the dataflow tasks to achieve what you need.

What I always do is that set the transaction option = required for package level and supported for all the other tasks.So that if any task fails the transaction rolls back..



回答3:

You are doing almost everything correctly.

First, ignore the transaction isolation level, it has a bug on readcommited and if you want to do it, you have to do it manually, see my answer on this post for an example.

But I don't see why you should configure isolation level, your problem can obviously be solved with isolation level. The thing you are missing is to set the transaction to required on the sequence container. Setting them to Supported means that the task does not start a transaction, but joins any transaction started by its parent container, if it exits.

So, by setting the transaction to required on the sequence container and supported on your data flows, it should work.



标签: ssis