SSIS add new column C based on if value in column

2019-07-22 15:15发布

问题:

I have a dataset, getting from a web service as following,

ColumnA ....  ColumnB...
userA         NULL
userB         userA
userC         NULL
userD         userC

What I want is

ColumnA ....  ColumnB...   ColumnC (if columnA's value exists in column B, position doesn't matter)
userA         NULL         YES 
userB         userA        NO
userC         NULL         YES
userD         userC        NO

Being new to SSIS, I am not sure if it's even possible, google is not helping much, can i do this without getting create a task using C# ?

Edit

Explaining further,

ColumnC = Value of ColumnA exists in ***ANY*** of ColumnB row. 

E.g. userA has value NULL in columnB but it's used in one or more rows of ColumnB hence ColumnC value is YES.

So in terms of logic it will be,

  • Get distinct values from ColumnB and put them in let say a listA
  • Foreach value in ColumnA if value exists in listA then columnC will be YES otherwise NO

回答1:

You will need to do this in the data flow. Add a Derived Column transformation and use logic to check for ISNULL in column B and to then check to see if column A and B match. Comparing NULL to anything may result in an error. The expression uses TERNARY operators, which adhere to the following form:

  • Boolean Condition (If) ? True Expression Result (Then) : False Expression Result (Else)

So, your expression will look something like one of the two following:

  • !ISNULL(ColumnB) && ColumnA == ColumnB ? "YES" : "NO"
  • !ISNULL(ColumnB) ? (ColumnA == ColumnB ? "YES" : "NO") : "NO"

https://docs.microsoft.com/en-us/sql/integration-services/expressions/conditional-ssis-expression

https://www.sqlchick.com/entries/2011/6/8/nested-conditional-operators-in-an-ssis-derived-column.html

Answer update for question edit

I understand better now what you are asking. Basically, you are performing a lookup on each row in Column B and checking to see if it exists on any rows in column A. This is more complex and challenging to solve because the typical data flow components are ideally suited for looking at data within a row, but not across rows. So, there are two solutions that I can think of off the top of my head and one solution not to use.

  1. Store Column A in a database table if possible and then perform a lookup on column B to see if the value exists (or vice verse - your sample data set is too small for me to be positive). This is the easiest solution.
  2. Pump the entire file through a script component, store the data in each column into separate array's and then use C#/LINQ to check for matches. This is much more complicated and I don't like it.
  3. solution not to use. Split column A and column B into separate data flows, sort them, and then perform a merge join to see if there are matches. The trick here is to know whether you need to maintain each row in the file (i.e., full outer join) or if it is OK to get rid of rows where column B is null (i.e., inner join). I like this even less because I highly doubt the rows will line up, which may result in incorrect results. I am at least raising this to point out the potential flaw in this solution.

I think you should look to see if the import file you are receiving can be simplified. Unless I am not correctly understanding your data set, it looks like there are separate table columns, where column A is dimensional data and column B is fact data, being included in one file where there is a potential one-to-many cardinality.



标签: ssis