Azure Data Factories vs SSIS

2020-07-09 10:00发布

问题:

I am thinking of moving our SSIS ETLs to Azure Data Factory. My arguments in favour of such leap are:

  • Our sources and targets are already in the cloud. ADF is cloud native so it seems at good fit.

  • ADF is a service are therefore we could consume and pay for it on demand. SSIS implies licensing costs, and doesn't lend lend it itself naturally for on-demand consumption (we thought of using DevOps to spin ETL servers on an ad-hoc basis)

  • Generating ETL code programmatically with SSIS requires very specific skills such as BIML or the DTS API. By moving to ADF I am hoping the combination of JSON and the TSQL and C# in USQL will make the necessary skills more generic.

I am hoping members of the community can share their experiences and thus help me come to a decision.

回答1:

The answers to this old post are quite outdated. My comments below are related to ADF version 2.

First of all, ADF has the capability to run SSIS packages, so moving your legacy ETL processes there and moving to ADF incrementally is not only possible but recommended. You don't want to change everything with every new piece of technology that comes out. You can then only implement new or modified ETL processes on ADF activities.

Secondly, although maybe not completely there yet, with ADF dataflows you can do transformations you can do with SSIS. There are still some missing bits and pieces, but most of the commonly used functionality is there.

ADF authoring does not require Visual Studio. It does need specific skills but I found the learning curve not to be steep. Documentation and best practices are still a bit lacking in certain areas, but someone already experienced in database / data warehouse architecture and ETL will find it relatively easy. The best thing about it is that most things can be done visually without messing with the code (which is just simple JSON).

Furthermore, ADF integrates with Azure Devops and uses Git for versioning. So you get change management for free.

For the more advanced needs you can also run Databricks activities with Java (Scala) or Python, integrate with Hadoop (Hive and Pig) and Spark.

Finally, ADF incorporates monitoring and diagnostic tools which in SSIS you had to build yourself. You can see much more easily which activity failed and what the error was.



回答2:

If your ETLs are simple and easy to convert-replace with Data Factory. If they required complex logic, use SSIS.
In other words, if the transform logic can be implemented by configuration, Data Factory is the best. If it required writing code and programming skills, SSIS is the right tool.

A few links that may help other people(you most likely made you decision already)

"Azure Data Factory and SSIS compared"

Think of ADF as a complementary service to SSIS, with its main use case confined to inexpensively dealing with big data in the cloud.

Download Azure_Data_Factory_vs_SSIS article from sqlbits



回答3:

ETL is Extract Transform and Load, while ADF does not transform anything (with ADF you can transform via using SQL Statement or Proc, but in ETL basic extraction logic are out of the box).

If you want to choose one out of them, its totally depend on your requirement.

  • If Transformation Logic is complex use ETL

  • If you are playing with huge data go for ADF

  • ADF charge as per your usage, but SSIS come with license.

  • If your data is in On-Premises, I would suggest you to go with ETL

  • Performance of ETL is totally depend on your on-premises machine
    configuration, while for ADF you don't have to worry about the
    performance.



回答4:

Use SSIS for the rich transformations and ADF for the big data workload and scale. There should be no problem in executing your SSIS packages in the cloud. It is a lift-and-shift scenario. Instead of using your Compute, you are renting the Compute.

If you are new to triggers, scheduling should not be an issue, as with SSMS, you get a similar interface for scheduling stuff on ADF

But I'd rather wait and see, if I have heavy on-premises investments.