Multiple row delimiters

2019-06-05 11:18发布

问题:

How to define multiple row delimiters for a Flat File Connection in SSIS? for example for a text file containing this string:

Civility is required at all times; rudeness will not be tolerated.

I want to have this two rows after using ';' and '.' for row delimiter:

Civility is required at all times

rudeness will not be tolerated

回答1:

For source data, I created a 3 line file

Civility is required at all times; rudeness will not be tolerated.
The quick brown fox jumped over the lazy dogs.
I am but a single row with no delimiter beyond the carriage return

The general approach I have taken below is to use a flat file connection manager with a format of Ragged Right and my header row delimiter is {CR}{LF}. I defined one columns, InputRow as String 8000. YMMV

In my data flow, after the flat file source, I add a script component as a data transformation called Split Rows.

On the Input Columns tab, check the InputRow and leave it as ReadOnly so the script can access the value. It'd be nice if you could switch it to ReadWrite and modify the outgoing values but that's not applicable for this type of operation.

By default, a script task is a synchronous component, meaning there's a 1:1 relationship between rows in and rows out. This will not suit your needs so you will need to switch it over to Asynchronous mode. I renamed the Output 0 to OutputSplit and changed the value of SynchronousInput from "Input 0 (16)" to None. Your value for 16 may vary.

On your Output Columns for OutputSplit, Add a Column with a name of SplitRow DT_STR 8000.

Within your script transformation, you only need to be concerned with the ProcessInputRow method. The string class offers a split method that takes an array of character values that will work as the splitters. Currently, it is hard coded below in the array initializer but it could just as easily be defined as a variable and passed into the script. That is left as an exercise to the poster.

/// <summary>
/// we have to make this an async script as 1 input row can be many output rows
/// </summary>
/// <param name="Row"></param>
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
    string[] results = Row.InputRow.Split(new char[]{';', '.'});
    foreach (string line in results)
    {
        // Remove this line if it is desirable to have empty strings in the buffer
        if (!string.IsNullOrEmpty(line))
        {
            OutputSplitBuffer.AddRow();
            // You might want to call trim operations on the line
            OutputSplitBuffer.SplitRow = line;
        }
    }
}

With all of this done, I hit F5 and voila,

This is going to be a fairly memory intensive package depending on how much data you run through it. I am certain there are optimizations one could make but this should be sufficient to get you going.