I have a text file that contains a list of files to load into database.
The list contains two columns:
FilePath,Type
c:\f1.txt,A
c:\f2.txt,B
c:\f3.txt,B
I want to provide this file as the source to SSIS. I then want it to go through it line by line. For each line, I want it to read the file in the FilePath column and check the Type.
If type is A then I want it to ignore the first 4 lines of the file that is located at the FilePath column of the current line and then load rest of the data inside that file in a table. If type is B then I want it to open the file and copy first column of the file into table 1 and second column into table 2 for all of the lines.
I would really appreciate if someone can please provide me a high level list of steps I need to follow.
Any help is appreciated.
Here is one way of doing it within SSIS. Below steps are with respect to SSIS 2008 R2.
FileName
,FilesToRead
andType
. FilesToRead variable will hold the list of files and their types information. We will have a loop that will go through each of those records and store the information in FileName and Type variables every time it loops through.SQLServer
.FileName
andType
Name this asFiles
.Type_A
. In this flat file connection manager, enter the value 4 in the text boxHeader rows to skip
so that the first four rows are always skipped.Type_B
.Files
and then place aRecordset Destination
. Configure the variableFilesToRead
in the recordset destination. Your first data flow task would like as shown below.FilesToRead
. Since, the recordset contains two columns, each time a record is looped through, the variablesFileName
andType
will be assigned the value of the current record.Type A files
andType B files
. You can configure each of these data flow tasks according to your requirements to read the files from connection managers. However, we need to disable the tasks based on the file that is being read.,Type A files
data flow task should be enabled only when A type files are being processed.Type B files
data flow task should be enabled only when B type files are being processed.Type A files
data flow task and press F4 to bring the properties. Click on the Ellipsis button available on theExpression
property.Disable
Property and enter the expression!(@[User::Type] == "A")
Type B files
data flow task and press F4 to bring the properties. Click on the Ellipsis button available on theExpression
property.Disable
Property and enter the expression!(@[User::Type] == "B")
Type A files
data flow task.Type B files
data flow task.Configuring Data Flow task Type A files
CR LF
denotes that the lines are ending with Carriage return and Line Feed. This file is stored in the path C:\f1.txtdbo.Table_A
in the SQL Server database as shown here.Type A files
to process the corresponding files. Double-click on the data flow taskType A files
. Place a Flat file source and OLE DB Destination inside the task.DelayValidation
property toTrue
. Click on theEllipsis
button on theExpressions
property.ConnectionString
property and set it to the Expression@[User::FileName]
Hope that helps you to achieve your task.
My solution would look like N + 1 flat file Connection Managers to handle the source files. CM A would address the skip first 4 rows file format, B sounds like it's just a 2 column file, etc. The last CM would be used to parse the command file you've illustrated.
Now that you have all of those Connection Managers defined, you can go about the processing logic.
Create 3 variables. 2 of type string (CurrentPath, CurrentType). 1 is of type Object and I called it Recordset.
The first Data Flow reads all the rows from the flat file source using "CM Control." This is the data you supplied in your example.
We will then use that Recordset object as the source for a ForEach Loop Container in what is commonly referred to as shredding. Bingle the term "Shred recordset ssis" and you're bound to hit a number of articles describing how to do it. The net result is that for each row in that source CM Control file, you will assign those values into the CurrentPath, CurrentType variables.
Inside that Loop container, create a central point for control for control to radiate out. I find a script task works wonderfully for this. Drag it onto the canvas, give it a strong name to indicate it's not used for anything and then create a data flow to handle each processing permutation.
The magic comes from using Expressions. Dang near everything in SSIS can have expressions set on their properties which is what separates the professionals from the poseurs. Here, we will double click on the line connecting to a given data flow and change the constraint type from "Constraint" to "Expression and Constraint" The Expression you would then use is something like
@[User::CurrentType] == "A"
This will ensure that path is only taken when both the parent task Succeeded and the condition is true.The second bit of expression magic will be applied to the connection managers themselves. They will need to have their ConnectionString property driven by the value of the
@[User::CurrentFile]
property. This will allow a design-time value ofC:\filea.txt
but would allow a runtime value, from the control file, to be\\network\share\ClientFileA.txt
Unless all the files have the same structure, you'll most likely need to set DelayValidation to True in the properties. Otherwise, SSIS will fail PreValidation as all the "CM A" to "CM N" would be using that CurrentFile variable which may or may not be a valid connection string for that file layout.I would recommend that you create a SSIS package for each different type of file load you're going to do. You can execute those packages from another program, see here: How to execute an SSIS package from .NET?
Given this information, you can write a quick program to execute the relevant packages: