SSIS vs. DTS performance

2019-05-08 06:14发布

问题:

Seems crazy to be doing this at this late date, but...

I am rebuilding some ETL infrastructure with a Rocket Software UniVerse source and an SQL destination. The old destination platform was SQL 2000 on Windows Server 2003, the new platform is SQL 2012 on Windows Server 2012. In both cases, an ODBC driver is used to connect to the source. Everything seems to work fine on the new platform, but the execution time for a package is exponentially slower. For example, one table with roughly 1.3 Million rows and 28 Columns takes about an hour using SQL 2000/DTS and over 3.5 hours using SQL 2012/SSIS. Both SQL servers are virtualized on Xen Server, the 2012 server has more RAM and more vCPUs, neither machine has an advantage in disk infrastructure. No metrics (Memory, disk IO, etc.) are red-lining (or really even coming close) on the 2012 server during package execution.

I have read several forum posts describing the same scenario, but none really seemed to have a solution that works for me. Since all of these posts were quite dated (most of these conversions from DTS to SSIS happened in the SQL 2005 days), I was curious if there was any fresher info out there.

The packages are very simple table copies, no transforms. I am using a "SELECT column, column,.. FROM sourcetable" for my source connection and 'Table or View - Fast Load' for my destination. The slow down APPEARS to be on the source side of the equation, though I can't be certain.

Any help appreciated.

回答1:

One option to investigate is lowering the buffer size in your data flow. By default, it's set at 10k rows. If you have a slow data source, it can take quite a while to fill up the "bucket" of data just to start sending a batch of information down to the destination. While it might seem counterintuitive, lowering that number can increase performance as 5k, or 1k or 100 rows of data fill the bucket much sooner. That data then gets shuffled through the data flow and lands in the source while bucket 2, 3, etc are being filled.

If you have a SQL Server source, you can optimize your query by hinting that you'd like a fast N rows, which you'd align with your SSIS package's row size.

See Rob Farley's article for more details about that.