This question already has an answer here:
-
Convert flatfile column to Datetime
2 answers
I want to copy data from a large csv files to DB table(with SSIS), but I have a problem with column SubmittedDate in csv file the client put mess in it (19.janv.50 or 08-AUG-69 ) so It cannot be copied into db table (SubmittedDate is date)
How can I clean up the date ?
I saw derived column in SSIS,
is this my best solution ? and how can I configure it ?
Because the format can differ for each record I do not believe this is a good case for derived column. I would use TSQL either as an update where the original data is loaded to a field called original date and an update is run on a column called SubmittedDate. Or have this run during the insert from staging to 'live' table if you are using this approach.
I would start with analysis using a select statement and use a case to determine the parsing you want to use. Keep going until Submitted (parsed) date is no longer null / blank. Hopefully this can get you started:
Select
-- 19.janv.50
Case When CharIndex('.', OriginalDate) = 3
And CharIndex('.', OriginalDate, 4) = 8 Then --Substring code goes here.
-- 08-AUG-69
Case When CharIndex('-', OriginalDate) = 3
And CharIndex('-', OriginalDate, 4) = 7 Then --Substring code goes here.
I agree with Joe that a Derived Column would likely get very complex to do this conversion. An alternative to writing a huge SQL update statement to test some formats and have to do string parsing is to use a script component
in the data flow task
to transform and add a derived column. See my answer on this post Convert varchar to datetime using VB script of how to set one up (that answer is in vb, I usually do c# but the principle is the same)
Note if you are new to scripting in SSIS their are actually 2 places to script. Script Task
and Script Component
the component is the one you want and will be visible when you are inside the data flow
.
Whether you use sql staging table (or secondary column) and tSQL to do the conversion or scripting your first task is to figure out what standards/formats are represented within your dataset. The second decision is what to do when a value doesn't match one of the known formats.
One key to writing your script would be DateTime.ParseExact() and DateTime.TryParseExact(). You can also use try catch blocks. Basically try converting with one of the formats you identify and stop when you get to the one you want. One way to do this would be to use a second function.
Sticking with VB
Private Function ConvertDateString (ByVal dateString As String)
IF DateTime.TryParseExact(dateString,"dd-MMM-yy", System.Globalization.CultureInfo.InvariantCulture)
return DateTime.ParseExact(dateString,"dd-MMM-yy", System.Globalization.CultureInfo.InvariantCulture)
END IF
IF DateTime.TryParseExact(dateString,"dd.MMM.yy", System.Globalization.CultureInfo.InvariantCulture)
return DateTime.ParseExact(dateString,"dd.MMM.yy", System.Globalization.CultureInfo.InvariantCulture)
END IF
End Function
IF DateTime.TryParseExact(dateString,"MM/dd/yyyy", System.Globalization.CultureInfo.InvariantCulture)
return DateTime.ParseExact(dateString,"MM/dd/yyyy", System.Globalization.CultureInfo.InvariantCulture)
END IF
IF DateTime.TryParseExact(dateString,"dd/MM/yyyy", System.Globalization.CultureInfo.InvariantCulture)
return DateTime.ParseExact(dateString,"dd/MM/yyyy", System.Globalization.CultureInfo.InvariantCulture)
END IF
...etc
End Function
Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
If Not Row.SampleDate_IsNull Then
Row.DerivedDate = ConvertDateString(Row.DateStringColumnName)
End If
End Sub
Note MMM
will give you month abbreviations.