可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
If I can do the required ETL requirements using stored procedures, any advantages of using SSIS packages instead? My ETL stuff is nothing major.
I feel like using an old technology. I like SQL.
Old technology does not equal obsolete as stored procedures won't go away any time soon.
回答1:
If your ETL is mostly E and L, with very little T, and if you can write your SPs so they don't rely on cursors, then going the SP-only route is probably fine.
For more complex processes, particularly those that involve heavy transforms, slowly changing dimensions, data mining lookups, etc, SSIS has three advantages.
First, it manages memory very efficiently, which can result in big performance improvements compared to T-SQL alone.
Second, the graphical interface lets you build large, complex and reliable transforms much more easily than hand-crafted T-SQL.
And third, SSIS lets you more easily interact with additional external sources, which can be very handy for things like data cleansing.
回答2:
I've lived in the land of stored procedure ETL for a multi-terabyte SQL Server data warehouse. This decision was made back in 2001 when .NET was 1.0, so VB6 was the programming language alternative, and SSIS wasn't around yet - it was DTS. I can tell you that there were advantages and disadvantages, like anything.
Some considerations:
- If everyone on your team understands SQL, it's easy to dig into the stored procs. SQL is a widely known skill which may be a benefit if you have a lot of ETL writers/readers. You have to be more than a casual user of SSIS in order to understand what it's doing. The high level graphical flow is nice for documentation, but if someone needs to get into the guts, they'd better know SSIS well.
- SQL is a pain to modularize. If you use UDFs, you are going to incur a huge performance hit. You'll write similar code in multiple places and you'll hate yourself for doing it, but often in ETL scenarios performance is king. SSIS will help you modularize and factor out your tasks.
- Don't expect to be able to easily use source control with SSIS. SQL - no problem. SSIS uses awful XML files which can be checked in, but good luck diffing with previous versions to see what changed and when.
- You need to think about your SPs in a modular way, even though it's hard to make them as modular as you'd like. Use temp tables to chunk up your processing. Put indexes on those temp tables before you use them. Don't try to do too much at once. Comment everything.
- If you're using cursors, you're doing it wrong. Don't be afraid to chain in some external console app you wrote in the language of your choice to do some things SQL just wasn't cut out for.
BTW - after I left that company, they finally upgraded the database from SQL 2000 to 2008 and slowly moved from stored procs to SSIS. At my new company, we own SSIS but after using it we all agreed that our custom written .NET ETL is a better fit for our purposes. Everyone takes their own route. The decision has to balance maintenance and performance and the skill-set of your team and the skill-set of the job pool in your area.
回答3:
I am in the middle of getting rid of our SSIS packages and using stored procedures. For us, stored procs are tremendously better:
1) They are far easier to maintain, we don't need bids, don't need to create projects and import packages into bids, so far fewer steps to make simple stored proc changes.
2) All of our current packages basically truncate data in a table, then repopulate from several other tables on the same server with direct mappings. Very easy Insert/select SQL to write.
3) They run much faster. We have no cursors, no looping structures, just straight SQL.
4) We don't have to spend all our time right-clicking and working in little bids windows trying to follow the flow of logic. We all know basic TSQL and that is sufficient for our tasks.
回答4:
I would say it depends some on what you are doing. However, from my experience the room for improvement with SSIS packages is tremendous. We saw 10 fold improvements in our data warehouse environment when we took some of the heavy hitting stored procedures and put them in SSIS packages. The memory utilization of SSIS (in this situation anyways) made all of the difference.
I want to reiterate that it is important to know what you are doing. For example, a SQL statement will usally outperform a SSIS data-flow when the data transform is table-to-table on the same server.
The best bet it to pick a SP or two and create them in SSIS and test them both.
Seems like the answer for all SQL questions start with, It depends...
回答5:
I have seen some tremedous performance gains from using SSIS, it is especially good if you have stored procedures that are using linked servers as this uses more processing power and linked servers tend to pull over the whole table into memory before limiting the rows needed by a join. We had a stored procedure that was taking 7 + hours to run, I broke this down into data from each server, then set up a local datasource for each in SSIS allowing processing to take place locally for each datasource as apposed to via linked server, the job now takes 6 minutes to run, I would say that is a massive gain.
Caralyn
回答6:
We are usign combined method to get best of two worlds:
We use SSIS to get data from external sources and load it in parallel into Staging database
Then we use SSIS packages to orchestrate pipelines and trigger appropriate SPs inside control flow.
Any transformation logic is incaplulated into SPs as data flows are hard to manage/modify and doesn't give any significant advantage:
1) It's easier to modify and troubleshoot SP than a package
2) There's no way to easily reuse components in SSIS except calling external packages
3) SVN diff of SP works, diff of SSIS package is awful :)
Also, we use SSIS to run SPs in parallel to boost overall performance.
回答7:
I don't see any obvious technical limitations. The stored procedure might be more difficult to follow than an SSIS package for complex ETL operations - but that isn't going to be true for every scenario. I have also found that packages (SSIS and DTS) are more readily recognized as "jobs" - the stored procedures that are executed by scheduled jobs are often overlooked by developers because they can't see the scheduled jobs.
That said, I have seen ETL performed by stored procedures and DTS/SSIS packages alike and as long as the stored procedure isn't a large mess of tangled code it seems appropriate. I haven't seen one method perform better or more reliably than another (but then I haven't seen stored procedures doing complex ETL).
回答8:
I tried some of features in SSIS and i was not happy with all of them.
I stated off with the data flow thingy and i was not really happy with the performance that i saw.
What i ended up doing was develop an SSIS packages which had a control flow of sql task each of which executed a stored proc.
This made sure that SQL server did most of the E,T, and the L. I think when you use the dataflow component the data actually moves from sql server to machine running the package which makes it not so efficient.
Having said that, i think i would have tried to optimize the Dataflow thingy( its been a while since i worked on it) if i had to interact with 3rd party applications/ databases / DW systems.
回答9:
SSIS is missing some basic functionality, it doesn't have a Informatica type package that allows development with a SQL statement to run against raw text files and SQL server sorely lacks Error Logging DML like Oracle. I really thought when Microsoft announced the addition of the Merge statement that of course they would implement the error bucket which is one of its most important features, well guess again. Error processing on the line level is important and if you are using a SQL statement to add batches of data if one record fails that whole batch is rolled back.
回答10:
Performance will be faster than normal sp. Do not need to create complex temp table, Cursor, indexing for retrieve data.
Data cleaning is advantage of SSIS.
Incremental handling is only possible in ssis.
We can create package configuration file and deploy it to any server. User can provide the server details and log in information.
Graphical user interface.
Logging, error handling is best in ssis.
回答11:
To small projects, if you have a solid sql skills, and a business requirement understanding, go ahead!
Otherwise, if you will face complexes data extraction, heavy transformation tasks. SSIS or another ETL tool will suffice.
cheers
回答12:
For data transfers between SQL servers use SSIS above SPs
You can easily face a factor 10 improvement as mentioned above
We went from 6-7 hours transfers to a more manageable time frame by embedding the SP in an SSIS package
On a side note: SSIS is basically a bunch of XML files which can be manipulate/used in different ways (eg for documentation)
回答13:
I've been working with SQL Server since version 6.5 - that's a long time! And from my experience most ETL is simple enough that T-SQL works perfectly fine and not only works but works very well - fast, reliable, simple structured programming. I believe anything that can be done in SSIS can be done in T-SQL by someone who knows what they're doing.
Most people who are heavy pro-SSIS, again in my experience, are inexperienced developers who have grown up with graphical tools and don't really know how to program.