SSIS Transform — Split one column into multiple co

2019-02-24 08:14发布

问题:

I'm trying to find out how to split a column I have in a table and split it into three columns after the result is exported to a CSV file.

For example, I have a field called fullpatientname. It is listed in the following text format:

Smith, John C

The expectation is to have it in three separate columns:

Smith
John
C

I'm quite sure I have to split this in a derived column, but I'm not sure how to proceed with that

回答1:

You are going to need to use a derived column for this process.

The SUBSTRING and FINDSTRING functions will be key to pull this off.

To get the first segment you would use something like this:

(DT_STR,25,1252) SUBSTRING([fullpatientname], 1, FINDSTRING(",",[fullpatientname],1)-1)

The above should display a substring starting with the beginning of the [fullpatientname] to the position prior to the comma (,).

The next segment would be from the position after the comma to the final space separator, and the final would be everything from the position following the final space separator to the end.



回答2:

It sounds like your business rule is

  1. The "last name" is all of the characters up to the first comma
  2. The "first name" will be all of the characters after the first comma and a space
  3. The "middle name" will be what (and is it always present)?
    1. the last character in the string (you will only ever have an initial letter)
    2. All of the characters after the second space

This logic will fail in lots of fun ways so be prepared for it. And also remember that once you combine information together, you cannot, with 100 accuracy, restore it to the component parts. Capture first, middle, last/surname and store them separately.

Approach A

A derived column component. Actually, a few of them added to your data flow will cover this. The first Derived Column will be tasked with finding the positions of the name breaks. This could be done all in a single Component but debugging becomes a challenge and then you will need to reference the same expression multiple times in a single row * 3 it quickly becomes a maintenance nightmare.

The second Derived Column will then use the positions defined in the first to call the LEFT and SUBSTRING functions to access points in the column

Approach B

I never reach for a script component first and the same should hold true for you. However, this is a mighty fine case for a script. The base .NET string library has a Split function that will break a string into pieces based on whatever delimiter you supply. The default is whitespace. The first call to split will use the ',' as the argument. The zeroeth ordinal string will be the last name. The first ordinal string will contain the first and middle name pieces. Call the string.Split method again, this time using the default value and the last element is the middle name and the remaining elements are called the first name. Or vice versa, the zeroeth element is the first name and everything else is last.

I've had to deal with cleaning names before and so I've seen different rules based on how they want to standardize the name.



回答3:

Try something like this, if your names are always in the same format (LastName-comma-space-FirstName-space-MI):

declare @FullName varchar(25) = 'Smith, John C'

select 
substring(@FullName, 1, charindex(',', @FullName)-1 ) as LastName,
substring(@FullName, charindex(',',@FullName) + 2, charindex(' ',@FullName,charindex(',',@FullName)+2) - (charindex(',',@FullName) + 2) ) as FirstName,
substring(@FullName, len(@FullName), 1) as MiddleInitial


回答4:

I am using SQL SERVER 2016 with SSIS in Visual Studio 2015. If you are using findstring you need to make sure the order is correct. I tried this first - FINDSTRING(",",[fullpatientname],1), but it wouldn't work. I had to look up the documentation and found the order to be incorrect. FINDSTRING([fullpatientname],",",1) fixed the problem for me. I am not sure if this is due to differences in versions.