-->

SSIS: Flat File default length

2020-07-17 04:28发布

问题:

I have to import about 50 different types of files every day. Some of them with a few columns, some inculde up to 250 columns.

The Flat File connection always defaults all columns to 50 chars. Some columns can be way longer than 50 chars, and will of course end up in errors.

Currently i am doing a stupid search&replace with notepad++ - Opening all SISS packages, replacing:

DTS:MaximumWidth="50"

by

DTS:MaximumWidth="500"

This is an annoying workaround. Is there any possibility to set a default length for flatfile string columns to a certain value?

I am developing in Microsoft Visual Studio Professional 2015 and SQL Server Data Tools 14.0.61021.0

Thanks!

回答1:

I don't think that there is a way to achieve this from SQL Server Data Tools.

But you can do some workaround to achieve this:

  1. Easiest solution, In the Flat file connection manager - Advanced Tab, select all columns (using Ctrl key) and change the data length property for them all in one edit. (detailed in @MikeHoney answer)
  2. You can use BIML (Business Intelligence Markup Language) to create ssis package, if you're new to BIML you can access to BIML Script website for detailed tutorials.

  3. You can create a Small application that loop over .dtsx files in a folder and replace DTS:MaximumWidth="50" with DTS:MaximumWidth="500" using normal String.Replace function or using Regular expressions. (you can read my answer @ Automate Version number Retrieval from .Dtsx files to see an exmaple on reading .dtsx file using Regular expressions)

Function To Read and Replace content of dtsx file (Vb.Net)

Public Sub FixDTSX(byval strFile as string)

    dim strContent as string = string.empty

    Using sr as new Io.StreamReader(strFile)

        strContent = sr.ReadToEnd()

        sr.Close()

    End Using

    strContent = strContent.Replace("DTS:MaximumWidth=""50""","DTS:MaximumWidth=""500""")

    Using sw as new Io.StreamWriter(strFile,False)

        sw.Write(strContent)

        sw.Close()

    End Using

End Sub


回答2:

There is a way to achieve what you want using the standard Visual Studio SSDT UI, although it is quite obscure. AFAIK it works in every version of this editor since SQL Server 2005.

With the package open, from the Connection Managers pane, right-click your Flat File Connection and choose Edit. Then navigate to the Advanced page. Then multi-select the columns you want to change (e.g. shift-click a range or ctrl-click a specific set). Now the Properties appearing at the right will be applied to all the selected columns.

In the example shown below, I have set all the selected columns to a width of 255.



回答3:

Esteban,

I suggest you use the Object Model API which allows you to develop SSIS packages programmatically. Using that, you can make use of any .net code that allows you to gather data/metadata from text files. Also, the assumption is that, since you are using SSIS, you already might be familiar with writing code in C#/VB.Net

Now, if you are just starting with the Object Model API, there would be a huge learning curve (but it is worth learning it if SSIS is your day to day life). If you do not have the time to invest right now, I would recommend you to use a library I wrote (called Pegasus) which greatly simplifies how you can use the Object Model API; you can create your packages in an almost declarative fashion (using C#).

On Github, there is an example that shows how to create a package that loads any number of text files with differing schemas in a given folder. See here; specifically the method GenerateProjectToLoadTextFilesToSqlServerDatabase().

In the example, I use a third party .Net library called lumenworks.framework to probe delimited files and get their metadata. Using this library, I get the names of the columns; and I also infer data types and lengths based on sampling the first 'n' number of rows. (In my code, I am only inferring ints, dates and strings; if you have more data types, add relevant code accordingly). Or you can specify one specific data type and length (looks like you want to use string of 500 chars) for all your columns. [Or (in some cases), you might have this metadata available outside in a excel file/config file.] Then I use this metadata to configure my text file connection managers programmatically.

YOu can download the code from Github and run the DataFlowExample by specifying where your source files are and see how far it gets you.

Another recommendation would be Biml, but I am not sure if you can incorporate your own/third party full fledged C# code (not just snippets) into Biml workflow. If you can, then go with Biml.

Let me know if you have any questions.