SSIS; row redirected to error even after inserting

2019-08-29 11:08发布

问题:

I have a SSIS package to insert/update rows to Database. First i use look up for checking if row already inserted to DB;If yes i update that row, else insert as new row.

My problem is when inserting , a row is inserted successfully but also redirected to error. How both can happen at same time? That too it happens some times not always - very inconsistent. How to track what caused the error ? I used "redirect row" here to get failed rows.

This happens only when it deployed on server.On running my local machine using BIDS works fine.

回答1:

Your OLE DB Destination is likely set to the default values

A quick recap of what all these values mean

  • Data access mode: You generally want Table or view - fast load or Table or view name variable - fast load as this will perform bulk inserts. The non-fast load choices result in singleton inserts which for any volume of data will have you questioning the sanity of those who told you SSIS can load data really fast.
  • Keep Identity: This is only needed if you want to explicitly provide an identity seed
  • Keep nulls: This specifies whether you should allow the defaults to fire
  • Table lock: Preemptively lock the table. Unless you're dealing with Hekaton tables (new SQL Server 2014 candy), touching a table will involve locks. Yes, even if you use the NOLOCK hint. Inserting data obviously results in locking so we can assure our ACID compliance. SQL Server will start with a small lock, either Row or Page level. If you cross a threshold of modifying data, SQL Server will escalate that lock to encapsulate the whole table. This is a performance optimization as it's easier to work if nobody else has their grubby little paws in the data. The penalty is that during this escalation, we might now have to wait for another process to finish so we can get exclusivity to the table. Had we gone big to begin with, we might have locked the table before the other process had begun. - Check constraints: Should we disable the checking of constraint values. Unless you have a post import step to ensure the constraint is valid, don't uncheck this. Swiftly loaded data that is invalid for the domain is no good.
  • Rows per batch: this is a pass through value to the INSERT BULK statement as the ROWS_PER_BATCH value.
  • Maximum insert commit size: The FastLoadMaxInsertCommitSize property specifies how many rows should be held in the transaction before committing. The 2005 default was 0 which meant everything gets committed or none of it does. The 2008+ default of 2 billion may be effectively the same, depending on your data volume.

So what

You have bad data somewhere in your insert. Something is causing the insert to fail. It might be the first row, last row, both or somewhere in between. The net effect is that the insert itself is rolled back. You designed your package to allow for the bad data to get routed to a flat file so a data steward can examine it, clean it up and get it re-inserted into the pipeline.

The key then is you needed to find some value that provides the optimal balance of insert performance size, more is better, relative to badness size. For the sake of argument, let's use a commit size of 5003, because everyone likes prime numbers, and assume our data source supplies 10009 rows of data. Three rows in there will violate the integrity of the target table and will need to be examined by a data steward.

This is going to result in 3 total batches being sent to the destination. The result is one of the following scenarios

  • Bad rows are the final 3 rows, resulting in only those 3 rows being sent to the text file and 10006 rows committed to the table in 2 batches.
  • Bad rows exist only in 1 of full sets. This would result in 5006 rows being written to the table and 5003 rows sent to our file
  • Bad rows split amongst each commit set. This results in 0 rows written to the table and all the data in our file.

I always felt Murphy was an optimist and the disk holding the error file would get corrupt but I digress.

What would be ideal is to whittle down the space bad data can exist in while maximizing the amount of good data inserted at a shot. In fact, there are a number of people who have written about it but I'm partial to the approach outlined in "Error Redirection with the OLE DB Destination".

We would perform an insert at our initial commit size of 5003 and successful rows will go as they will. Bad rows would go to a second OLE DB Destination, this time with a smaller commit size. There's differences of opinion whether you should immediately go to singleton inserts here or add an intermediate bulk insert attempt at half your primary commit size. This is where you can evaluate your process and find the optimal solution for your data.

If data still fails the insert at the single row level, then you write that to your error file/log. This approach allows you to put as much good data into the target table as you can in the most efficient mechanism as possible when you know you're going to have bad data.

Bad data addendum

Yet a final approach to inserting bad data is to not even try to insert it. If you know foreign keys must be satisfied, add a Lookup component to your data flow to ensure that only good values are presented to the insert. Same for NULLs. You're already checking your business key so duplicates shouldn't be an issue. If a column has a constraint that the Foo must begin with Pity, then check it in the data flow. Bad rows all get shunted off to a Derived Column Task that adds business friendly error messages and then they land at a Union All and then all the errors make it to the error file.

I've also written this logic where each check gets its own error column and I only split out the bad data prior to insertion. This prevents the business user from fixing one error in the source data only to learn that there's another error. Oh, and there's still another, try again. Whether this level of data cleansing is required is a conversation with your data supplier, data cleaner and possibly your boss (because they might want to know how much time you're going to have to spend making this solution bullet proof for the horrible data they keep flinging at you)

References

  • Keep nulls
  • Error Redirection with the OLE DB Destination
  • Batch Sizes, Fast Load, Commit Size and the OLE DB Destination
  • Default value for OLE DB Destination FastLoadMaxInsertCommitSize in SQL Server 2008


回答2:

I have noticed that if you check lock tables, and also have an update, you will get deadlocks between the 2 flows in your dataflow. So we do not check table lock. The performance seems to be the same.



回答3:

My finding might help those who visit here..

@billinkc made a broad comment; i had gone all through that.Well Later after digging down the system the problem was something different.

My SSIS package has script task within to do some operations.That one uses a folder called TEMP in the disk.The program which triggered this SSIS also simultaneously using the same TEMP folder. Now there the file read/write exceptions were not handled. This caused script task to fail resulting a package fail error.Since the INSERT functionality carried before the script task,INSERT was successful.Later when script failed it moved rows to error too!

I tried with catching these "file errors/exceptions" and it worked!!