Importing yyyyMMdd Dates From CSV in SSIS

2020-07-22 18:54发布

I have 12 columns using the yyyymmdd format. In the Data Flow Task, I have a Flat File Source, a Derived Column Task and an OLE DB Destination. I'm applying the following expression to these fields in the Derived Column Task:

(DT_DBDATE)(SUBSTRING((DT_STR,10,1252)([Date_Column]),1,4) + "-"
+ SUBSTRING((DT_STR,10,1252)([Date_Column]),5,2) + "-"
+ SUBSTRING((DT_STR,10,1252)([Date_Column]),7,2))

It keeps making me convert the field before I substring it, but I have the fields set up as DT_STR in the Connection Manager. The destination field is in DATE format in SQL Server. SSIS always shows the "could not be converted because of a potential loss of data" error. I've checked the raw data in the CSV and don't see any dates that look out of place. Everything fits the format.

I've also attempted to use a Data Conversion task as suggested here, but ended up getting the same kind of error. Any recommendations? The answers I've seen on here all point to the use of a Derived Column task, but several others I've mentioned this to have had the same problem I'm having.

EDIT: Here's a sample of the data. The field specifically mentioned in all of the errors is CRTFCTN_DT. There are 470 fields in the dataset, but this does show the first two date fields.

Sample

SOLUTION: In the Connection Manager, I changed the yyyymmdd fields to DT_STR(10) (8 characters for the date, 2 for the added dashes), and rebuilt my Data Flow Task. This solved the problem. If you DON'T rebuild the task, the Derived Column task won't pick up on the new data type. Once the task was recreated, I also no longer needed to cast the dates as strings before applying substring.

2条回答
Anthone
2楼-- · 2020-07-22 19:26

You setup column [Date_Column] as DT_STR in Connection Manager. What is the length you specified? by default it is 50. In Derived Column, you used its length as 10, which is the error message "could not be converted because of a potential loss of data".

查看更多
The star\"
3楼-- · 2020-07-22 19:35

Try casting as (DT_DATE) instead of DT_DBDATE

(DT_DATE)((SUBSTRING((DT_STR,50,1252)([Date_Column]),1,4) + "-"
+ SUBSTRING((DT_STR,50,1252)([Date_Column]),5,2) + "-"
+ SUBSTRING((DT_STR,50,1252)([Date_Column]),7,2)))

You can do this workaround

Assuming that the date column name is inColumn

In the DataflowTask, Add a script component , Mark inColumn as input column, add a new Output column outColumn with dataType DT_DBTIMESTAMP

Change script language to vb.net

enter image description here

Mark your date column as Input

enter image description here

Create an Output column for each date column

enter image description here

Inside the script, in Input0_ProcessInputRow sub use DateTime.ParseExact Function as follow:

In this example the date column is MailReceivingDate

Public Class ScriptMain
    Inherits UserComponent


    Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)

        If Not Row.MailReceivingDate_IsNull AndAlso
              Not String.IsNullOrEmpty(Row.MailReceivingDate.Trim) Then

            Row.outColumn = DateTime.ParseExact(Row.MailReceivingDate.Trim, "yyyyMMdd", New System.Globalization.CultureInfo("en-GB"))

        Else

            Row.outColumn_IsNull = True

        End If

    End Sub

End Class
查看更多
登录 后发表回答