SSIS pkg with flat-file connection with fewer colu

2019-03-03 02:13发布

问题:

Assume a flat file F1.txt, Column MyCol1 and a package Pkg1 to load said file to SQL server.

No problem, right? Right.

Now assume a flat file F2.txt, Columns MyCol1, MyCol2 and the same package Pkg1 to load said file to SQL server.

We'll make a few adjustments to Pkg1 and presto - it loads F2.txt like a dream.

Now we feed it F1.txt and that's where things deteriorate.

BTW, this does not confine itself to flat-files but is of a more general nature.

Any and all suggestions on how to run legacy-data within the same package are welcome.

TIA

Peter

回答1:

It reads like you have two problems here. The first is understanding how to use Connection Managers. For flat file inputs, you are generally going to be better served by creating a connection manager per file layout. File 1 looks like (Column1) and File 2 looks like (Column1, Column2)? That means 2 different Flat File Connection managers need to be defined.

If you have 2 version of File 2, one where Column1 has numbers and another with Column1 containing character data, those would require 2 unique connection managers (3 in total).

The good news relative to the above is that file names changes are trivial and do not require a unique Connection Manager to be created. F1.txt, F1_20120501.txt, F1.good.txt, etc would all be served by the Connection Manager you have defined for that layout. You would simply need to use an expression on the ConnectionString property of a given Connection Manager to update the current package at run-time.

So now that you have all these Flat File Connection Managers, you need to use them. That magic happens in the Data Flow Tasks. A dataflow is real persnickety about the metadata used in it. When you are designing a data flow, you are making a contract with SSIS and if you try to violate it by making a character field into a date field or not providing all the columns, the package will fail validation checks as you aren't holding up your end of the bargain. The resolution to this is that you're again going to need to define multiple data flows around the various Connection Managers your packages need.

With all that defined, you would simply need a coordinator to look at source files to determine which data flow should be executed. I provided an example on this question Create SSIS package to import from one of many data sources

There was also a similar question where I proposed a solution that may be of interest SSIS Task for inconsistent column count import? It really depends what your rules are for processing.

If you are trying to consolidate/reuse business logic in your SSIS packages, then I would look an approach of using the various dataflows to stage the discrete sources into a singular data storage thing (raw file, staging table with lots of null columns, etc).



标签: ssis