I am learning SSIS for the first time. I am trying to load multiple tables from my staging database to multiple tables in the data warehouse. My staging database is nothing but a database with all the tables from all my source databases copied to it. I found that each data flow task only allows to connect one source table to one destination table. Do I have to use as many data flow tasks as the number of tables I have in my source database? Is there a simple and more efficient way to do this?
From my understanding, I would need an empty database with empty Dim and Fact tables created in SQL Server which would act as destination data warehouse and receive the data from the staging database. Therefore I have already created this empty data warehouse. Please help.
There is no convention that you need to have only 1 source and 1 destination in a DFT (Data Flow Task).
You can have multiple source and destination components inside a DFT. However, you may run out of memory if you have too many source to destination components in a single DFT. In the pre-execute phase SSIS has to decide the memory to allocate for individual S2D components, and if the memory consumption is too high then server may not able to grant enough memory for all the components. This may result in long waiting.
However there is a property called EngineThreads
for DFT which defines how many threads the data flow engine can create and run in parallel. According to MSDN:
To understand how this property works, consider the sample package with three
Data Flow tasks. Each of Data Flow task contains ten source-to-destination
execution trees. If you set EngineThreads to 10 on each Data Flow task,
all 30 execution trees can potentially run simultaneously.However, the general
rule is not to run more threads in parallel than the number of available
processors. Running more threads than the number of available processors
can hinder performance because of the frequent context-switching between threads.
So it's better to create individual Data Flow Task and perform the operation.
Note: If you are getting data from multiple sources, then you can do it in a single data flow task.