What is the simple way to find the column name from Lineageid in SSIS. Is there any system variable avilable?
问题:
回答1:
I remember saying this can't be that hard, I can write some script in the error redirect to lookup the column name from the input collection.
string badColumn = this.ComponentMetaData.InputCollection[Row.ErrorColumn].Name;
What I learned was the failing column isn't in that collection. Well, it is but the ErrorColumn reported is not quite what I needed. I couldn't find that package but here's an example of why I couldn't get what I needed. Hopefully you will have better luck.
This is a simple data flow that will generate an error once it hits the derived column due to division by zero. The Derived column generates a new output column (LookAtMe) as the result of the division. The data viewer on the Error Output tells me the failing column is 73. Using the above script logic, if I attempted to access column 73 in the input collection, it's going to fail because that is not in the collection. LineageID 73 is LookAtMe and LookAtMe is not in my error branch, it's only in the non-error branch.
This is a copy of my XML and you can see, yes, the outputColumn id 73 is LookAtme.
<outputColumn id="73" name="LookAtMe" description="" lineageId="73" precision="0" scale="0" length="0" dataType="i4" codePage="0" sortKeyPosition="0" comparisonFlags="0" specialFlags="0" errorOrTruncationOperation="Computation" errorRowDisposition="RedirectRow" truncationRowDisposition="RedirectRow" externalMetadataColumnId="0" mappedColumnId="0"><properties>
I really wanted that data though and I'm clever so I can union all my results back together and then conditional split it back out to get that. The problem is, Union All is an asynchronous transformation. Async transformations result in the data being copied from one set of butters to another resulting in...new lineage ids being assigned so even with a union all bringing the two streams back together, you wouldn't be able to call up the data flow chain to find that original lineage id because it's in a different buffer.
Around this point, I conceded defeat and decided I could live without intelligent/helpful error reporting in my packages.
回答2:
I know this is a long dead thread but I tripped across a manual solution to this problem and thought I would share for anyone who happens upon this same problem. Granted this doesn't provide a programmatic solution to the problem but for simple debugging it should do the trick. The solution uses a Derived Column as an example but this seems to work for any Data Flow component.
Answer provided by Todd McDermid and taken from AskSQLServerCentral:
"[...] Unfortunately, the lineage ID of your columns is pretty well hidden inside SSIS. It's the "key" that SSIS uses to identify columns. So, in order to figure out which column it was, you need to open the Advanced Editor of the Derived Column component or Data Conversion. Do that by right clicking and selecting "Advanced Editor". Go to the "Input and Output Properties" tab. Open the first node - "Derived Column Input" or "Data Conversion Input". Open the "Input Columns" tab. Click through the columns, noting the "LineageID" property of each. You may have to do the same with the "Derived Column Output" node, and "Output Columns" inside there. The column that matches your recorded lineage ID is the offending column."
回答3:
There is no simple way to find out column name by lineage id.
If you want to know this using BIDS You have to inspect all components inside dataflow using Advanced properties, Input and Output columns tab and see LineageID for each column and input/output path.
But You can:
- inspect XML - this is very difficult
- write .NET application and use FindColumnByLineageId
However, second solution includes a lot of coding and understanding of pipeline because You have to programmaticaly: open the package, iterate over tasks, iterate inside containers, iterate over transformations inside data flows to find particular component to use proposed method.
回答4:
Here is a solution that:
- Works at package runtime (not pre-populating)
- Is automated through a Script Task and Component
- Doesn't involve installing new assemblies or custom components
- Is nicely BIML compatible
Check out the full solution here.
EDIT
Here is the short version.
- Create 2 Object variables,
execsObj
andlineageIds
- Create Script Task in Control flow, give it ReadWrite access to both variables
- Add an assembly reference to your script task for Microsoft.SqlServer.DTSPipelineWrap.dll (may required SQL Client SDK be installed; required for the MainPipe object below)
- Insert the following code into your Script Task
Dictionary<int, string> lineageIds = null;
public void Main()
{
// Grab the executables so we have to something to iterate over, and initialize our lineageIDs list
// Why the executables? Well, SSIS won't let us store a reference to the Package itself...
Dts.Variables["User::execsObj"].Value = ((Package)Dts.Variables["User::execsObj"].Parent).Executables;
Dts.Variables["User::lineageIds"].Value = new Dictionary<int, string>();
lineageIds = (Dictionary<int, string>)Dts.Variables["User::lineageIds"].Value;
Executables execs = (Executables)Dts.Variables["User::execsObj"].Value;
ReadExecutables(execs);
Dts.TaskResult = (int)ScriptResults.Success;
}
private void ReadExecutables(Executables executables)
{
foreach (Executable pkgExecutable in executables)
{
if (object.ReferenceEquals(pkgExecutable.GetType(), typeof(Microsoft.SqlServer.Dts.Runtime.TaskHost)))
{
TaskHost pkgExecTaskHost = (TaskHost)pkgExecutable;
if (pkgExecTaskHost.CreationName.StartsWith("SSIS.Pipeline"))
{
ProcessDataFlowTask(pkgExecTaskHost);
}
}
else if (object.ReferenceEquals(pkgExecutable.GetType(), typeof(Microsoft.SqlServer.Dts.Runtime.ForEachLoop)))
{
// Recurse into FELCs
ReadExecutables(((ForEachLoop)pkgExecutable).Executables);
}
}
}
private void ProcessDataFlowTask(TaskHost currentDataFlowTask)
{
MainPipe currentDataFlow = (MainPipe)currentDataFlowTask.InnerObject;
foreach (IDTSComponentMetaData100 currentComponent in currentDataFlow.ComponentMetaDataCollection)
{
// Get the inputs in the component.
foreach (IDTSInput100 currentInput in currentComponent.InputCollection)
foreach (IDTSInputColumn100 currentInputColumn in currentInput.InputColumnCollection)
lineageIds.Add(currentInputColumn.ID, currentInputColumn.Name);
// Get the outputs in the component.
foreach (IDTSOutput100 currentOutput in currentComponent.OutputCollection)
foreach (IDTSOutputColumn100 currentoutputColumn in currentOutput.OutputColumnCollection)
lineageIds.Add(currentoutputColumn.ID, currentoutputColumn.Name);
}
}
- Create Script Component in Dataflow with ReadOnly access to
lineageIds
and the following code.
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
Dictionary<int, string> lineageIds = (Dictionary<int, string>)Variables.lineageIds;
int? colNum = Row.ErrorColumn;
if (colNum.HasValue && (lineageIds != null))
{
if (lineageIds.ContainsKey(colNum.Value))
Row.ErrorColumnName = lineageIds[colNum.Value];
else
Row.ErrorColumnName = "Row error";
}
Row.ErrorDescription = this.ComponentMetaData.GetErrorDescription(Row.ErrorCode);
}
回答5:
For anyone using SQL Server versions before SS2016, here are a couple of reference links for a way to get the Column name:
http://www.andrewleesmith.co.uk/2017/02/24/finding-the-column-name-of-an-ssis-error-output-error-column-id/
which is based on: http://toddmcdermid.blogspot.com/2016/04/finding-column-name-for-errorcolumn.html
I appreciate we aren't supposed to just post links, but this solution is quite convoluted, and I've tried to summarise by pulling info from both Todd and Andrew's blog posts and recreating them here. (thank you to both if you ever read this!)
From Todd's page:
- Go to the "Inputs and Outputs" page, and select the "Output 0" node. Change the "SynchronousInputID" property to "None". (This changes the script from synchronous to asynchronous.)
- On the same page, open the "Output 0" node and select the "Output Columns" folder. Press the "Add Column" button. Change the "Name" property of this new column to "LineageID".
- Press the "Add Column" button again, and change the "DataType" property to "Unicode string [DT_WSTR]", and change the "Name" property to "ColumnName".
- Go to the "Script" page, and press the "Edit Script" button. Copy and paste this code into the ScriptMain class (you can delete all other method stubs):
public override void CreateNewOutputRows() {
IDTSInput100 input = this.ComponentMetaData.InputCollection[0];
if (input != null)
{
IDTSVirtualInput100 vInput = input.GetVirtualInput();
if (vInput != null)
{
foreach (IDTSVirtualInputColumn100 vInputColumn in vInput.VirtualInputColumnCollection)
{
Output0Buffer.AddRow();
Output0Buffer.LineageID = vInputColumn.LineageID;
Output0Buffer.ColumnName = vInputColumn.Name;
}
}
} }
Feel free to attach a dummy output to that script, with a data viewer, and see what you get. From here, it's "standard engineering" for you ETL gurus. Simply merge join the error output of the failing component with this metadata, and you'll be able to transform the ErrorColumn number into a meaningful column name.
But for those of you that do want to understand what the above script is doing:
- It's getting the "first" (and only) input attached to the script component.
- It's getting the virtual input related to the input. The "input" is what the script can actually "see" on the input - and since we didn't mark any columns as being "ReadOnly" or "ReadWrite"... that means the input has NO columns. However, the "virtual input" has the complete list of every column that exists, whether or not we've said we're "using" it.
- We then loop over all of the "virtual columns" on this virtual input, and for each one...
- Get the LineageID and column name, and push them out as a new row on our asynchronous script.
The image and text from Andrew's page helps explain it in a bit more detail:
This map is then merge-joined with the ErrorColumn lineage ID(s) coming down the error path, so that the error information can be appended with the column name(s) from the map. I included a second script component that looks up the error description from the error code, so the error table rows that we see above contain both column names and error descriptions.
The remaining component that needs explaining is the conditional split – this exists just to provide metadata to the script component that creates the map. I created an expression (1 == 0) that always evaluates to false for the “No Rows – Metadata Only” path, so no rows ever travel down it.
Whilst this solution does require the insertion of some additional plumbing within the data flow, we get extremely valuable information logged when errors do occur. So especially when the data flow is running unattended in Production – when we don’t have the tools & techniques available at design time to figure out what’s going wrong – the logging that results gives us much more precise information about what went wrong and why, compared to simply giving us the failed data and leaving us to figure out why it was rejected.