I would interested to hear from other developer tip on SSIS best practices, dos and don'ts, pitfalls and tricks, etc.
I'm trying to introduce a few here at my company.
I would interested to hear from other developer tip on SSIS best practices, dos and don'ts, pitfalls and tricks, etc.
I'm trying to introduce a few here at my company.
This blog by Jamie Thompson is really good that has both best practices and naming conventions: BestPracticesandNamingConventions
There is a 4000-character limit to the DT_WSTR
(Unicode string) data type used by SSIS. Since expressions are converted to DT_WSTR
, they also cannot exceed 4000 characters.
You'll get an error if you exceed the limit at design time. However, you will not receive any obvious warnings or errors if you do so at run time. The operation will fail silently, and the expression may be evaluated incorrectly.
Note: this does not mean that strings or SSIS variables cannot exceed 4000 characters. The limitation only applies to expressions and other places that DT_WSTR is used. For a good explanation of this, take a look at:
http://blogs.conchango.com/jamiethomson/archive/2009/05/27/4000-character-limit-in-ssis.aspx
Here's an example that reproduces the issue:
Even though the variable is not empty, the constraint will incorrectly evaluate to False because the expression exceeds 4000 characters. However, this will not cause any errors and the package will not fail.
If you notice strange behavior in your SSIS logic, take a look at your expressions and ensure that they don't exceed 4000 characters when evaluated.
This is another great blog about config files for porting SSIS packages - a huge issue we ran into when we first started using SSIS in the tiered environment.
http://jessicammoss.blogspot.com/2008/05/ssis-configuration-to-configuration-to.html
Additionally, use the properties Version, Version Description as much as you can. Although there is a package history (Right click and choose View History) this adds an additional layer of documentation (annotations are good but are not that practical due to time constraints). Version / Version Description helps since the version # in the property can be coordinated well with ETL Design Documentation Revision History log.
Put your SSIS packages in source control.
We have found that using a datasource of select field1, field2 from mytable is faster than using the option to select all records from the table. Made a huge difference in one of the package we did where we preprocessed data into a staging table for an export.
Build a standard import package and a standard export package that already has the variables you commonly need and the connections you commonly need and error handling you commonly use. It is much faster to build a package by opening a template package and saving under the new name, then adjusting for particular circumstances than it is to build everything from scratch each time.
A couple of things not specific to SSIS, if you send an export to another company, make sure to copy the file to an archive location as well as the ftp site. This will help tremendously when there is a question about what was sent when. Never import data from another source without looking at whether the data needs to be cleansed. An address missing the city and state is useless, an email that says "Talk to his secretary" is also useless, don't add this stuff to your database. Don't try to add records that will fail insert. If first name is a required field in your database and the import file doesn't havea value for one of the records, either add a default value ("Unknown") or don't try to insert the record.
Log errors and steps processed. This will help tremendously in figuring out where something went wrong when that package that has run flawlessly for six months suddenly fails.
Use variables and configurations to change things at run time. This lets you have test locations and prod locations, so you aren't putting the test file you don't yet want the client to see on his ftp site.
Use the trash destination (you can download it here - http://www.sqlis.com/post/Trash-Destination-Adapter.aspx) as your destination until you have tested enough to know that what you are sending to the destination is what you intended to send. In conjunction with this, use the Data Viewer to see what you data looks like at various parts of the data flow.
For pretty large packages, or whenever possible, use non-container container to help a future reader of the packages understand the package easily. A non-container is just like any other container except that it has no programming logic behind it. It just helps to group related items together for easier understanding of the package. The concept behind it is that a human mind cannot take in more than 5-6 items together.
My own answer to this, which I'll add to over time.
You can also look at the BI Reference Implementation Microsoft did back in 2005. It's called Project REAL. It used real data from Barnes & Noble to simulate the full life-cycle of a data warehouse. There's a guide on ETL best practices there.
see this post, think this guy has covered most of it.
http://moeens.com/?s=best+practices&x=0&y=0#/2010/12/ssis-best-practices/
see this post:
http://randypaulo.com/2011/06/27/ssis-tips-tricks-best-practice/
Here is a list of easy to implement performance best practices and some development best practices