How to validate a CSV file before importing into t

2019-01-23 09:43发布

问题:

I have CSV file with three columns.

sno  sname  quantity
---  -----  --------
 1   aaa    23
 2   bbb    null
 3   ccc    34
 4   ddd    ddd
 5   eee    xxx
 6   fff    87

Table in the SQL Server database is as following/

CREATE TABLE csvtable
(       sno         int
    ,   sname       varchar(100)
    ,   quantity    numeric(5,2)
)

I created an SSIS package to import csv file data into the database table. I am getting an error during package execution because the quantity is a string. I created another table to store the invalid data.

CREATE TABLE wrongcsvtable
(       sno         nvarchar(10)
    ,   sname       nvarchar(100)
    ,   quantity    nvarchar(100)
)

In the csvtable, I would like to store the following data.

sno  sanme   quantity
---  ------  --------
 1   aaa     23
 3   ccc     34
 6   fff     87

In the wrongcsvtable, I would like to store the following data.

sno  sanme   quantity
---  ------  --------
 2   bbb     null
 4   ddd     ddd
 5   eee     xxx

Could someone point me in the right direction to achieve the above mentioned output?

回答1:

Here is one possible option. You can achieve this using the Data Conversion transformation within the Data Flow Task. Following example shows how this can be achieved. The example uses SSIS 2005 with SQL Server 2008 database.

Step-by-step process:

  1. Create a file named FlatFile.CSV and populate it with data as shown in screenshot #1.

  2. In the SQL database, create two tables named dbo.CSVCorrect and dbo.CSVWrong using the scripts provided under SQL Scripts section. The fields in the table dbo.CSVWrong should have the data types VARCHAR or NVARCHAR or CHAR so that it can accept the invalid records.

  3. On the SSIS package, create an OLE DB connection named SQLServer to connect to SQL Server database and create a Flat File Connection named CSV. Refer screenshot #2. Configure the flat file connection CSV as shown in screenshots #3 - #7. All the columns in the flat file connection should be configured as string data type so that the package doesn't fail while reading the file.

  4. On the Control Flow tab of the package, place a Data Flow Task as shown in screenshot #8.

  5. On the Data Flow tab of the package, place a Flat File Source and configure it as shown in screenshots #9 and #10.

  6. On the Data Flow tab of the package, place a Data Conversion transformation and configure it as shown in screenshot #11. Click on the Configure Error Output and change the Error and Truncation column values from Fail component to Redirect row. Refer screenshot #12.

  7. On the Data Flow tab of the package, place an OLE DB Destination and connect the green arrow from Data Conversion to this OLE DB Destination. Configure the OLE DB Destination as shown in screenshots #13 and #14.

  8. On the Data Flow tab of the package, place another OLE DB Destination and connect the red arrow from Data Conversion to this OLE DB Destination. Configure the OLE DB Destination as shown in screenshots #15 and #16.

  9. Screenshot #17 shows the Data Flow Task once it has been completely configured.

  10. Screenshot #18 shows data in the tables before the package execution.

  11. Screenshot #19 shows package execution within Data Flow Task.

  12. Screenshot #20 shows data in the tables after the package execution.

Hope that helps.

SQL Scripts:

CREATE TABLE [dbo].[CSVCorrect](
    [Id] [int] IDENTITY(1,1) NOT NULL,
    [SNo] [int] NULL,
    [SName] [varchar](50) NULL,
    [QuantityNumeric] [numeric](18, 0) NULL,
CONSTRAINT [PK_CSVCorrect] PRIMARY KEY CLUSTERED ([Id] ASC)) ON [PRIMARY]
GO

CREATE TABLE [dbo].[CSVWrong](
    [Id] [int] IDENTITY(1,1) NOT NULL,
    [SNo] [varchar](50) NULL,
    [Quantity] [varchar](50) NULL,
    [SName] [varchar](50) NULL,
    [ErrorCode] [int] NULL,
    [ErrorColumn] [int] NULL,
CONSTRAINT [PK_CSVWrong] PRIMARY KEY CLUSTERED ([Id] ASC)) ON [PRIMARY]
GO

Screenshot #1:

Screenshot #2:

Screenshot #3:

Screenshot #4:

Screenshot #5:

Screenshot #6:

Screenshot #7:

Screenshot #8:

Screenshot #9:

Screenshot #10:

Screenshot #11:

Screenshot #12:

Screenshot #13:

Screenshot #14:

Screenshot #15:

Screenshot #16:

Screenshot #17:

Screenshot #18:

Screenshot #19:

Screenshot #20:



回答2:

Put a conditional split in your data flow. Check if quantity is non-integral. The branch you've created will go to the wrongcsvtable and the default branch will go to the csvtable

EDIT Forgot there is no numeric test in the conditional split. What you should do is add a derived column transform that converts the quantity field into an integer. In the Configure Error Output dialog, set the error and truncation values to Ignore Failure. This will pass the item through with the value for the new field as NULL if the data is not numeric. After that, in the conditional split, check if the new field is null or not. Records with a null field go to the wrongcsvtable, other records go to the csvtable.



标签: ssis