SSIS clean up date from csv file [duplicate]

2019-06-10 03:10发布

问题:

This question already has an answer here:

  • Convert flatfile column to Datetime 2 answers

I want to copy data from a large csv files to DB table(with SSIS), but I have a problem with column SubmittedDate in csv file the client put mess in it (19.janv.50 or 08-AUG-69 ) so It cannot be copied into db table (SubmittedDate is date) How can I clean up the date ? I saw derived column in SSIS,

is this my best solution ? and how can I configure it ?

回答1:

Because the format can differ for each record I do not believe this is a good case for derived column. I would use TSQL either as an update where the original data is loaded to a field called original date and an update is run on a column called SubmittedDate. Or have this run during the insert from staging to 'live' table if you are using this approach.

I would start with analysis using a select statement and use a case to determine the parsing you want to use. Keep going until Submitted (parsed) date is no longer null / blank. Hopefully this can get you started:

Select 
   -- 19.janv.50
   Case When CharIndex('.', OriginalDate) = 3 
         And CharIndex('.', OriginalDate, 4) = 8 Then --Substring code goes here.
   -- 08-AUG-69
   Case When CharIndex('-', OriginalDate) = 3 
         And CharIndex('-', OriginalDate, 4) = 7 Then --Substring code goes here.


回答2:

I agree with Joe that a Derived Column would likely get very complex to do this conversion. An alternative to writing a huge SQL update statement to test some formats and have to do string parsing is to use a script component in the data flow task to transform and add a derived column. See my answer on this post Convert varchar to datetime using VB script of how to set one up (that answer is in vb, I usually do c# but the principle is the same)

Note if you are new to scripting in SSIS their are actually 2 places to script. Script Task and Script Component the component is the one you want and will be visible when you are inside the data flow.

Whether you use sql staging table (or secondary column) and tSQL to do the conversion or scripting your first task is to figure out what standards/formats are represented within your dataset. The second decision is what to do when a value doesn't match one of the known formats.

One key to writing your script would be DateTime.ParseExact() and DateTime.TryParseExact(). You can also use try catch blocks. Basically try converting with one of the formats you identify and stop when you get to the one you want. One way to do this would be to use a second function.

Sticking with VB

Private Function ConvertDateString (ByVal dateString As String)
    IF DateTime.TryParseExact(dateString,"dd-MMM-yy", System.Globalization.CultureInfo.InvariantCulture)
         return DateTime.ParseExact(dateString,"dd-MMM-yy", System.Globalization.CultureInfo.InvariantCulture)
    END IF

    IF DateTime.TryParseExact(dateString,"dd.MMM.yy", System.Globalization.CultureInfo.InvariantCulture)
         return DateTime.ParseExact(dateString,"dd.MMM.yy", System.Globalization.CultureInfo.InvariantCulture)
    END IF
End Function

    IF DateTime.TryParseExact(dateString,"MM/dd/yyyy", System.Globalization.CultureInfo.InvariantCulture)
         return DateTime.ParseExact(dateString,"MM/dd/yyyy", System.Globalization.CultureInfo.InvariantCulture)
    END IF

    IF DateTime.TryParseExact(dateString,"dd/MM/yyyy", System.Globalization.CultureInfo.InvariantCulture)
         return DateTime.ParseExact(dateString,"dd/MM/yyyy", System.Globalization.CultureInfo.InvariantCulture)
    END IF

  ...etc

End Function

Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)

    If Not Row.SampleDate_IsNull Then

        Row.DerivedDate = ConvertDateString(Row.DateStringColumnName)

    End If
End Sub

Note MMM will give you month abbreviations.