I have a SSIS package to insert/update rows to Database. First i use look up for checking if row already inserted to DB;If yes i update that row, else insert as new row.
My problem is when inserting , a row is inserted successfully but also redirected to error. How both can happen at same time? That too it happens some times not always - very inconsistent. How to track what caused the error ? I used "redirect row" here to get failed rows.
This happens only when it deployed on server.On running my local machine using BIDS works fine.
My finding might help those who visit here..
@billinkc made a broad comment; i had gone all through that.Well Later after digging down the system the problem was something different.
My SSIS package has script task within to do some operations.That one uses a folder called TEMP in the disk.The program which triggered this SSIS also simultaneously using the same TEMP folder. Now there the file read/write exceptions were not handled. This caused script task to fail resulting a package fail error.Since the INSERT functionality carried before the script task,INSERT was successful.Later when script failed it moved rows to error too!
I tried with catching these "file errors/exceptions" and it worked!!
Your OLE DB Destination is likely set to the default values
A quick recap of what all these values mean
Table or view - fast load
orTable or view name variable - fast load
as this will perform bulk inserts. The non-fast load choices result in singleton inserts which for any volume of data will have you questioning the sanity of those who told you SSIS can load data really fast.INSERT BULK
statement as theROWS_PER_BATCH
value.So what
You have bad data somewhere in your insert. Something is causing the insert to fail. It might be the first row, last row, both or somewhere in between. The net effect is that the insert itself is rolled back. You designed your package to allow for the bad data to get routed to a flat file so a data steward can examine it, clean it up and get it re-inserted into the pipeline.
The key then is you needed to find some value that provides the optimal balance of insert performance size, more is better, relative to badness size. For the sake of argument, let's use a commit size of
5003
, because everyone likes prime numbers, and assume our data source supplies 10009 rows of data. Three rows in there will violate the integrity of the target table and will need to be examined by a data steward.This is going to result in 3 total batches being sent to the destination. The result is one of the following scenarios
I always felt Murphy was an optimist and the disk holding the error file would get corrupt but I digress.
What would be ideal is to whittle down the space bad data can exist in while maximizing the amount of good data inserted at a shot. In fact, there are a number of people who have written about it but I'm partial to the approach outlined in "Error Redirection with the OLE DB Destination".
We would perform an insert at our initial commit size of
5003
and successful rows will go as they will. Bad rows would go to a second OLE DB Destination, this time with a smaller commit size. There's differences of opinion whether you should immediately go to singleton inserts here or add an intermediate bulk insert attempt at half your primary commit size. This is where you can evaluate your process and find the optimal solution for your data.If data still fails the insert at the single row level, then you write that to your error file/log. This approach allows you to put as much good data into the target table as you can in the most efficient mechanism as possible when you know you're going to have bad data.
Bad data addendum
Yet a final approach to inserting bad data is to not even try to insert it. If you know foreign keys must be satisfied, add a Lookup component to your data flow to ensure that only good values are presented to the insert. Same for NULLs. You're already checking your business key so duplicates shouldn't be an issue. If a column has a constraint that the Foo must begin with Pity, then check it in the data flow. Bad rows all get shunted off to a Derived Column Task that adds business friendly error messages and then they land at a Union All and then all the errors make it to the error file.
I've also written this logic where each check gets its own error column and I only split out the bad data prior to insertion. This prevents the business user from fixing one error in the source data only to learn that there's another error. Oh, and there's still another, try again. Whether this level of data cleansing is required is a conversation with your data supplier, data cleaner and possibly your boss (because they might want to know how much time you're going to have to spend making this solution bullet proof for the horrible data they keep flinging at you)
References
I have noticed that if you check lock tables, and also have an update, you will get deadlocks between the 2 flows in your dataflow. So we do not check table lock. The performance seems to be the same.