Optimizing SSIS package for millions of rows with

2020-04-11 02:46发布

问题:

Hi i am currently trying to optimize an SSIS package when i do an Upsert/delete which handels about 93+ million rows from a table in a remote MariaDB source. The table also contains approximately 63 columns.

Currently i'm using Sort and Merge Join in my package but as i've read some guides its recommended to do the sorting part in the server and not with the sort functionality in SSIS DT, as it puts a load on the SSIS Server Memory.

And as i'm currently using this solution in Azure Data Factory running the package fails (most often Times out, even though i've increased the Time Out properties both in package side and in Azure Data factory).

What is the recommended way to tackle this?

If i've understood it right and as i mentioned it before i can skip the load on the SISS server by sorting DB-Server-Side. But as i'm new to the whole SQL and SSIS stuff i'm not quite sure how a sort like that would be in the SQL Command.

Also i've though about batching but even here i'm uncertain how that would work in SSIS.

What is recommended here?

My SSIS-Package looks like this right now:

I Followed this type of example: Synchronize Table Data Using a Merge Join in SSIS

(FYI: The red error icons are there because i lost connection during the screenshot, this is a fully working solution otherwise.)

回答1:

I have two recommendations:

Server side sorting

In OLE DB Source change the access mode to SQL Command. And use ORDER BY clause:

Select * from table ORDER BY col1, col2

After that you should open OLE DB Source advanced editor (Right click on the OLE DB source, show advanced editor) go to the columns tab and change the outputIsSorted property to True and set change the SortKeyPosition for the columns used in the ORDER BY clause.

  • SSIS sorted data flows
  • Where is the IsSorted property?

Read data in chunks

I don't have good knowledge in MariaDB SQL syntax but i will provide some example in SQLite and Oracle:

  • Reading Huge volume of data from Sqlite to SQL Server fails at pre-execute
  • Getting top n to n rows from db2
  • SSIS failing to save packages and reboots Visual Studio

Update 1 - Package problems

There are some problems in the package:

  • You are reading and writing from the same table
  • You are performing Update and delete tables on a large amount of data
  • You are using Merge Join

Some recommendations:

  • Try using a staging table instead of reading and writing from the same table since you are reading, writing, deleting, updating from the same destination table.
  • Use partitioning in the destination table which allows to delete and update records from a specific partition instead of the entire table