Thanks in advance for your help. I'm in need of help on writing SSIS script component to delimit single row to multiple rows. There were many helpful blog and post I looked at below:
http://beyondrelational.com/ask/public/questions/1324/ssis-script-component-split-single-row-to-multiple-rows-parent-child-variation.aspx
http://bi-polar23.blogspot.com/2008/06/splitting-delimited-column-in-ssis.html
However, I need a little extra help on coding to complete the project. Basically here's what I want to do.
Input data
ID Item Name
1 Apple01,02,Banana01,02,03
2 Spoon1,2,Fork1,2,3,4
Output data
ParentID ChildID Item Name
1 1 Apple01
1 2 Apple02
1 3 Banana01
1 4 Banana02
1 5 Banana03
2 1 Spoon1
2 2 Spoon2
2 3 Fork1
2 4 Fork2
2 5 Fork3
2 6 Fork4
Below is my attempt to code, but feel free to revise whole if it's illogic. SSIS Asynchronous output is set.
Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
Dim posID As Integer, childID As Integer
Dim delimiter As String = ","
Dim txtHolder As String, suffixHolder As String
Dim itemName As String = Row.ItemName
Dim keyField As Integer = Row.ID
If Not (String.IsNullOrEmpty(itemList)) Then
Dim inputListArray() As String = _
itemList.Split(New String() {delimiter}, _
StringSplitOptions.RemoveEmptyEntries)
For Each item As String In inputListArray
Output0Buffer.AddRow()
Output0Buffer.ParentID = keyField
If item.Length >= 3 Then
txtHolder = Trim(item)
Output0Buffer.ItemName = txtHolder
'when item length is less than 3, it's suffix
Else
suffixHolder = Trim(item)
txtHolder = Left(txtHolder.ToString(), Len(txtHolder) _
- Len(suffixHolder)) & suffixHolder.ToString()
Output0Buffer.ItemName = txtHolder
End If
Next
End If
End Sub
The current code produces the following output
ID Item Name
1 Apple01
1 02
1 Banana01
1 02
1 03
2 Spoon1
2 2
2 Fork1
2 2
2 3
2 4
If I come across as pedantic in this response, it is not my intention. Based on the comment "I'm new at coding and having a problem troubleshooting" I wanted to walk through my observations and how I came to them.
Problem analysis
The desire is to split a single row into multiple output rows based on a delimited field associated to the row.
The code as it stands now is generating the appropriate number of rows so you do have the asynchronous part (split) of the script working so that's a plus. What needs to happen is we need to 1) Populate the Child ID column 2) Apply the item prefix to all subsequent row when generating the child items.
I treat most every problem like that. What am I trying to accomplish? What is working? What isn't working? What needs to be done to make it work. Decomposing problems into smaller and smaller problems will eventually result in something you can do.
Code observations
Pasting in the supplied code resulted in an error that itemList was not declared. Based on usage, it seems that it was intended to be itemName.
After fixing that, you should notice the IDE indicating you have 2 unused variables (posID, childID) and that the variable txHolder is used before it's been assigned a value. A null reference exception could result at runtime.
My coworker often remarks warnings are errors that haven't grown up yet so my advice to you as a fledgling developer is to pay attention to warnings unless you explicitly expect the compiler to warn you about said scenario.
Getting started
With a choice between solving the Child ID situation versus the name prefix/suffix stuff, I'd start with an easy one, the child id
Generating a surrogate key
That's the fancy title phrase that if you searched on you'd have plenty of hits to ssistalk or sqlis or any of a number of fabulously smart bloggers. Devil of course is knowing what to search on. No where do you ever compute or assign the child id value to the stream which of course is why it isn't showing up there.
We simply need to generate a monotonically increasing number which resets each time the source id changes. I am making an assumption that the inbound ID is unique in the incoming data like a sales invoice number would be unique and we are splitting out the items purchased. However if those IDs were repeated in the dataset, perhaps instead of representing invoice numbers they are salesperson id. Sales Person 1 could have another row in the batch selling vegetables. That's a more complex scenario and we can revisit if that better describes your source data.
There are two parts to generating our surrogate key (again, break problems down into smaller pieces). The first thing to do is make a thing that counts up from 1 to N. You have defined a childId
variable to serve this. Initialize this variable (1) and then increment it inside your foreach loop.
Now that we counting, we need to push that value onto the output stream. Putting those two steps together would look like
childID = 1
For Each item As String In inputListArray
Output0Buffer.AddRow()
Output0Buffer.ParentId = keyField
Output0Buffer.ChildId = childID
' There might be VB shorthand for ++
childID = childID + 1
Run the package and success! Scratch the generate surrogate key off the list.
String mashing
I don't know of a fancy term for what needs to be done in the other half of the problem but I needed some title for this section. Given the source data, this one might be harder to get right. You've supplied value of Apple01, Banana01, Spoon1, Fork1. It looks like there's a pattern there (name concatenated with a code) but what it is it? Your code indicates that if it's less than 3, it's a suffix but how do you know what the base is? The first row uses a leading 0 and is two digits long while the second row does not use a leading zero. This is where you need to understand your data. What is the rule for identifying the "code" part of the first row? Some possible algorithms
- Force your upstream data providers to provide consistent length codes (I think this has worked once in my 13 years but it never hurts to push back against the source)
- Assuming code is always digits, evaluate each character in reverse order testing whether it can be cast to an integer (Handles variable length codes)
- Assume the second element in the split array will provide the length of the code. This is the approach you are taking with your code and it actually works.
I made no changes to make the generated item name work beyond fixing the local variables ItemName/itemList. Final code eliminates the warnings by removing PosID and initializing txtHolder to an empty string.
Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
Dim childID As Integer
Dim delimiter As String = ","
Dim txtHolder As String = String.Empty, suffixHolder As String
Dim itemName As String = Row.ItemName
Dim keyField As Integer = Row.ID
If Not (String.IsNullOrEmpty(itemName)) Then
Dim inputListArray() As String = _
itemName.Split(New String() {delimiter}, _
StringSplitOptions.RemoveEmptyEntries)
' The inputListArray (our split out field)
' needs to generate values from 1 to N
childID = 1
For Each item As String In inputListArray
Output0Buffer.AddRow()
Output0Buffer.ParentId = keyField
Output0Buffer.ChildId = childID
' There might be VB shorthand for ++
childID = childID + 1
If item.Length >= 3 Then
txtHolder = Trim(item)
Output0Buffer.ItemName = txtHolder
Else
'when item length is less than 3, it's suffix
suffixHolder = Trim(item)
txtHolder = Left(txtHolder.ToString(), Len(txtHolder) _
- Len(suffixHolder)) & suffixHolder.ToString()
Output0Buffer.ItemName = txtHolder
End If
Next
End If
End Sub