An interesting one, we're evaluating ETL tools for pre-processing statement data (e.g. utility bills, bank statements) for printing.
Some of the data comes through in a single flat file, with different record types.
e.g. a record type with "01" as the first field will be address data. This will have name and address fields. A record type with "02" will be summary data, with balances and totals. Record type "03" will be a line item on the statement.
Each statement will have one 01 and 02 records, and multiple 03 records. I could pre-parse the file and split into 3 files for loading into a table, but this is less than ideal.
We take the file and do a few manipulations on it (e.g. add in a couple more fields to the address record, and maybe do some totalling / validation), and then send the file in pretty much the same format (But with the extra fields added) to our print composition program.
How would you do this in SSIS?
The big problem with variant records in SSIS is that you don't get any of the benefits of the connection manager helping with the layout, since the connection manager can only handle a single layout.
So typically, you end up with a CRLF terminated flat file with two columns: recordtype and recorddata. Then you put the conditional split in and parse each type of row on different paths. The parsing will have to split up the remaining record data and put it in columns and convert as normal, either with a derived column transform or a script transform and potentially conversion transforms.
If you had a lot of packages to do, I would seriously consider writing a custom component which produced 3 outputs already converted to your destination types.
answered my own question - see below script. AcctNum come in from a derived column from the flat file source and will be correctly populated for 02 record types, save it in local static varialbe and put it back on the row for other record types that do not contain the acct number.
/* Microsoft SQL Server Integration Services Script Component
* Write scripts using Microsoft Visual C# 2008.
* ScriptMain is the entry point class of the script.*/
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;
[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{
static String AccountNumber = null;
public override void PreExecute()
{
base.PreExecute();
/*
Add your code here for preprocessing or remove if not needed
*/
}
public override void PostExecute()
{
base.PostExecute();
/*
Add your code here for postprocessing or remove if not needed
You can set read/write variables here, for example:
Variables.MyIntVar = 100
*/
}
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
if (Row.RecordType == "02")
AccountNumber = Row.AcctNum; // Store incomming Account Number into local script variable
else if (Row.RecordType == "06" || Row.RecordType == "07" || Row.RecordType == "08" ||
Row.RecordType == "09" || Row.RecordType == "10")
Row.AcctNum = AccountNumber; // Put Stored Account Number on this row.
}
}
This is possible, bu you will have to write custom logic. I did this once with DTS.
If the file is delimited, SSIS will import the fields correctly. You can write a script that examines the record type field, then branches into different inserts depending on the record type. If the file has records that are not delimited, but each type has its own fixed widths, this becomes a lot more complicated, since you'd have to parse and split each imported line, with the record types and their width hardcoded in the script.
There are a few ways to do it, but I think the easiest one to understand would be to add a conditional split after the source task, and then push it through a bunch of data conversion tasks to get the right format of data.
Make sure that your source is set up with the correct data types, so nothing falls through (e.g.-all strings). Then just check the "Record Type" field in that conditional split to send it to the right branch.