I have an SSIS import module that loops through files in a folder and for each file, loops through the worksheets within them. Within these files I only want data on specific Worksheets which have the same naming convention i.e. 2006 - claims report
, 2007 - claims report
and so on.
Is there anyway that I can only import the worksheets that have that naming convention and skip all others?
Would this be a bit of scripting within the Foreach container using regex on the Variable value or would it entail an expression used with the Foreach container?
Yes, one possible option is to make use of Script Task to process only the Worksheets of your preference.
Following example was created using SSIS 2008 R2 and Excel 2010. The working folder for this example is
C:\Temp\
. I think the logic should still hold good for previous versions.In the folder path
C:\Temp\
, create an Excel 2007 spreadsheet file namedCountry_States.xlsx
with three worksheets namelyUS_1
,US_2
andCanada_1
.US_1
of Country_States.xlsx contained the following dataUS_2
of Country_States.xlsx contained the following dataCanada_1
of Country_States.xlsx contained the following dataCreate a table in SQL Server named dbo.Destination using the below create script. Excel sheet data will be inserted into this table.
The table is currently empty.
Create a new SSIS package and on the package, create the following 6 variables. FolderPath will contain the folder where the Excel files are stored. FilePattern will contain the extension of the files that will be looped through and this example works only for
.xlsx
. FilePath will be assigned with a value by the Foreach Loop container but we need a valid path to begin with for design time and it is currently populated with the pathC:\temp\Country_States.xlsx
of the first Excel file. SheetName will contain the actual sheet name but we need to populate with initial valueUS_1$
to avoid design time error.ProcessTheSheet
will contain true/false and set with the default value of false.PatternToMatch
will contain the pattern that will use to match only the worksheets of our choice.In the package's connection manager, create an ADO.NET connection with the following configuration and name it as ExcelSchema.
Select the provider
Microsoft Office 12.0 Access Database Engine OLE DB Provider
under .Net Providers for OleDb. Provide the file pathC:\temp\Country_States.xlsx
Click on the
All
section on the left side and set the property Extended Properties toExcel 12.0
to denote the version of Excel. Here in this case 12.0 denotesExcel 2007 or above
. Click on the Test Connection to make sure that the connection succeeds.Create an Excel connection manager named Excel as shown below.
Create an OLE DB Connection SQL Server named
SQLServer
. So, we should have three connections on the package as shown below.We need to do the following connection string changes so that the Excel file is dynamically changed as the files are looped through.
On the connection ExcelSchema, configure the expression
ServerName
to use the variableFilePath
. Click on the ellipsis button to configure the expression.Similarly on the connection Excel, configure the expression
ServerName
to use the variableFilePath
. Click on the ellipsis button to configure the expression.On the Control Flow, place two Foreach Loop containers one within the other. The first
Foreach Loop container
named Loop files will loop through the files. The secondForeach Loop container
will through the sheets within the container. Within the inner For each loop container, place a Script Task that will validate only the sheets that should be processed and a Data Flow Task that will read the Excel files and load data into SQLConfigure the first Foreach loop container named Loop files as shown below:
Configure the first Foreach loop container named Loop sheets as shown below:
Script Task should be configured with following code that verifies the
SheetName
value against the pattern stored inPatternToMatch
variable. If the SheetName matches the pattern, then the variable ProcessTheSheet is set to True or else it will be set to False.C# code for SSIS 2008 and above
Include the statement
using System.Text.RegularExpressions;
for RegEx.Right the connector that joins the Script Task and the Data Flow Task and select Edit option. It should bring the Precedence Constraint Editor dialog. Set the Evaluation operation to Expression and set the Expression to
@[User::ProcessTheSheet]
This expression will allow the package to continue to Data Flow Task only if the Sheetname matches the pattern provided in the variable PatternToMatch. You can notice that the connector contains fx, which means there is an expression in place and also the color changes from Green to Blue.Inside the data flow task, place an Excel Source, Derived Column and OLE DB Destination as shown below:
Configure the Excel Source to read the appropriate Excel file and the sheet that is currently being looped through.
Configure the derived column to create new columns for file name and sheet name. This is just to demonstrate this example but has no significance.
Configure the OLE DB destination to insert the data into the SQL table.
Below screenshot shows successful execution of the package.
When the
PatternToMatch
variable is set to the value CA*, the table is populated only with the values from the Sheet Canada_1.Deleted all the rows from the table. Changed the
PatternToMatch
variable value to US*, the table is populated only with the values from the sheets US_1 and US_2.Hope that helps.