ErrorColumn value does not exist as Lineage ID

2019-03-28 02:58发布

问题:

During the insert into a destination table, any error that occurs is redirected to Errors table where we can see the ErrorCode and ErrorColumn. The problem is that we got a value in ErrorColumn which does not exist anywhere within the package. Namely, there is not a single column that has LineageID that is equal to ErrorColumn.

Later, while enabling NULL entry in every single column, one by one, I found which column caused the problem. When I analyzed the column inside of a Data Flow task it did not have the LineageID that was reported in an ErrorColumn. For example, the ErrorColumn reported was 413, but LineageID is 84 during the first merge, and it changes during various sortings. Regardless of that, it never becomes 413. This ErrorColumn ID (413) does not exist at all yet it is reported during the error redirection insert into destination source (Error table).

I've checked many sites, but they all suggested enumerating through ComponenteMetaData.InputCollection or ComponentMetaData.OutputCollection in a script task, and then enumerating through columns in order to find LineageID, but it was without any success.

As I said, I've solved the problem, but as we are in the early stage of ETL process, this might happen in some other cases. How can this problem be tackled?

回答1:

I'm copying my answer so we can get an authoritative Q&A on the site

What is the simple way to find the column name from Lineageid in SSIS

I remember saying this can't be that hard, I can write some script in the error redirect to lookup the column name from the input collection.

string badColumn = this.ComponentMetaData.InputCollection[Row.ErrorColumn].Name;

What I learned was the failing column isn't in that collection. Well, it is but the ErrorColumn reported is not quite what I needed. I couldn't find that package but here's an example of why I couldn't get what I needed. Hopefully you will have better luck.

This is a simple data flow that will generate an error once it hits the derived column due to division by zero. The Derived column generates a new output column (LookAtMe) as the result of the division. The data viewer on the Error Output tells me the failing column is 73. Using the above script logic, if I attempted to access column 73 in the input collection, it's going to fail because that is not in the collection. LineageID 73 is LookAtMe and LookAtMe is not in my error branch, it's only in the non-error branch.

This is a copy of my XML and you can see, yes, the outputColumn id 73 is LookAtme.

<outputColumn id="73" name="LookAtMe" description="" lineageId="73" precision="0" scale="0" length="0" dataType="i4" codePage="0" sortKeyPosition="0" comparisonFlags="0" specialFlags="0" errorOrTruncationOperation="Computation" errorRowDisposition="RedirectRow" truncationRowDisposition="RedirectRow" externalMetadataColumnId="0" mappedColumnId="0"><properties>

I really wanted that data though and I'm clever so I can union all my results back together and then conditional split it back out to get that. The problem is, Union All is an asynchronous transformation. Async transformations result in the data being copied from one set of butters to another resulting in...new lineage ids being assigned so even with a union all bringing the two streams back together, you wouldn't be able to call up the data flow chain to find that original lineage id because it's in a different buffer.

Around this point, I conceded defeat and decided I could live without intelligent/helpful error reporting in my packages.

2012

The 2012 release of SSIS changed how they used LineageID to keep columns in sync. Instead of components mapping a number from a source to sink, they went with a textual representation of the element. The XML referenced above would now look like

 <outputColumn
 refId="Package\DFT Generate Errors\DER Divide by SomeNumber.Outputs[Derived Column Output].Columns[LookAtMe]"
 dataType="i4"
 errorOrTruncationOperation="Computation"
 errorRowDisposition="RedirectRow"
 lineageId="Package\DFT Generate Errors\DER Divide by SomeNumber.Outputs[Derived Column Output].Columns[LookAtMe]"
 name="LookAtMe"
 truncationRowDisposition="FailComponent">

If you look at the ErrorColumn now, they don't even reference the textual lineageid. Instead, they reference column 6. If I search through the source XML, I'm not going to find a reference to Column 6 anywhere. It must be some run-time magic.

The net result unfortunately is the same - you cannot access the error column because it's being created in this component because it only exists in the Output columns collection. It isn't available in the Error Columns collection.



回答2:

I realize this is a late answer, but I'm new to the SSIS game. I have created a solution that

  1. Works at package runtime (not pre-populating)
  2. Is automated through a Script Task and Component
  3. Doesn't involve installing new assemblies or custom components
  4. Is nicely BIML compatible

Check out the full solution here.

EDIT

Here is the short version.

  1. Create 2 Object variables, execsObj and lineageIds
  2. Create Script Task in Control flow, give it ReadWrite access to both variables
  3. Insert the following code into your Script Task
Dictionary<int, string> lineageIds = null;

public void Main()
{
    // Grab the executables so we have to something to iterate over, and initialize our lineageIDs list
    // Why the executables?  Well, SSIS won't let us store a reference to the Package itself...
    Dts.Variables["User::execsObj"].Value = ((Package)Dts.Variables["User::execsObj"].Parent).Executables;
    Dts.Variables["User::lineageIds"].Value = new Dictionary<int, string>();
    lineageIds = (Dictionary<int, string>)Dts.Variables["User::lineageIds"].Value;
    Executables execs = (Executables)Dts.Variables["User::execsObj"].Value;

    ReadExecutables(execs);

    Dts.TaskResult = (int)ScriptResults.Success;
}

private void ReadExecutables(Executables executables)
{
    foreach (Executable pkgExecutable in executables)
    {
        if (object.ReferenceEquals(pkgExecutable.GetType(), typeof(Microsoft.SqlServer.Dts.Runtime.TaskHost)))
        {
            TaskHost pkgExecTaskHost = (TaskHost)pkgExecutable;
            if (pkgExecTaskHost.CreationName.StartsWith("SSIS.Pipeline"))
            {
                ProcessDataFlowTask(pkgExecTaskHost);
            }
        }
        else if (object.ReferenceEquals(pkgExecutable.GetType(), typeof(Microsoft.SqlServer.Dts.Runtime.ForEachLoop)))
        {
            // Recurse into FELCs
            ReadExecutables(((ForEachLoop)pkgExecutable).Executables);
        }
    }
}

private void ProcessDataFlowTask(TaskHost currentDataFlowTask)
{
    MainPipe currentDataFlow = (MainPipe)currentDataFlowTask.InnerObject;
    foreach (IDTSComponentMetaData100 currentComponent in currentDataFlow.ComponentMetaDataCollection)
    {
        // Get the inputs in the component.
        foreach (IDTSInput100 currentInput in currentComponent.InputCollection)
            foreach (IDTSInputColumn100 currentInputColumn in currentInput.InputColumnCollection)
                lineageIds.Add(currentInputColumn.ID, currentInputColumn.Name);

        // Get the outputs in the component.
        foreach (IDTSOutput100 currentOutput in currentComponent.OutputCollection)
            foreach (IDTSOutputColumn100 currentoutputColumn in currentOutput.OutputColumnCollection)
                lineageIds.Add(currentoutputColumn.ID, currentoutputColumn.Name);
    }
}

4. Create Script Component in Dataflow with ReadOnly access to lineageIds and the following code.

public override void Input0_ProcessInputRow(Input0Buffer Row)
  {
      Dictionary<int, string> lineageIds = (Dictionary<int, string>)Variables.lineageIds;

      int? colNum = Row.ErrorColumn;
      if (colNum.HasValue && (lineageIds != null))
      {
          if (lineageIds.ContainsKey(colNum.Value))
              Row.ErrorColumnName = lineageIds[colNum.Value];

          else
              Row.ErrorColumnName = "Row error";
      }
      Row.ErrorDescription = this.ComponentMetaData.GetErrorDescription(Row.ErrorCode);
  }


回答3:

Right click on CSV component (and many other components) and select "show advanced editor". Go to the last tab, "input and output properties"

On this screen look in the tree menu for you output columns (NOT(!) external columns). These have the lineage id in the "common properties"

There is a lot of jabber that this is hard to find, impossible to find without re-writting the code of visual studio itself and what not but i just checked this on both 2010 and 2012 on different systems and it isnt any harder then what i just described.