Record only certain rows of a text file in SSIS

2019-09-02 10:37发布

问题:

I'm having a hard time trying to do a simple loading of data from a flat file to a database. The problem is that there are bad rows or at least rows that are not formatted as data in that text file.

Sample.txt:

                Stackoverflow School at Philippines

Record: 100101
Date: 6/20/2014

Name:                Age: About:
-------------------- ---  --------------------------
Coolai               19   Bad Row Question
Qwerty               17   Java
Qwerty               19   C#
*User1               21   Dynamic Data
User4                27   Assembly

                    Stackoverflow School at Nippon

Record: 100102
Date: 6/23/2014

Name:                Age: About:
-------------------- ---  --------------------------
Sayuri               19   MSSQL
Niwatori             21   Dynamic Data
Jagaimo              27   Assembly
*User7               21   Dynamic Data
User9                27   Assembly

I am using Fixed Width format on the Flat File Connection and tried to approach the problem using conditional split but once it hits the white space, it stops loading the data.

Is it possible to load the data depending on a certain row count? Because at the end, this is the only thing I want from the text file:

Coolai               19   Bad Row Question
Qwerty               17   Java
Qwerty               19   C#
User1                21   Dynamic Data
User4                27   Assembly
Sayuri               19   MSSQL
Niwatori             21   Dynamic Data
Jagaimo              27   Assembly
User7                21   Dynamic Data
User9                27   Assembly

Sample.txt file LINK.

回答1:

Since the source text file doesn't follow delimited format standard, you need to use script task, and write custom script to handle all those white spaces and bad data. You need to convert text file into some delimited format then it can be processed as per requirement.

The logic for script task is - read file line by line and put logic for when to skip when execution hits bad line/data. By doing this result will be in standard delimited format.

Try below code in script, you can use extracted data from List<string> goodData

        string line;
        bool isNextLineGoodData = false;
        List<string> goodData = new List<string>();
        // Read the file and display it line by line.
        System.IO.StreamReader file = new System.IO.StreamReader("c:\\csv.txt");
        while ((line = file.ReadLine()) != null)
        {

            if (isNextLineGoodData)
            {
                if (line=="")
                {
                    isNextLineGoodData = false;
                }
                else
                {
                    goodData.Add(line);
                }

            }
            if (line=="")
            {
                isNextLineGoodData = false;
            }
            if (line.StartsWith("---"))
            {
                isNextLineGoodData = true;

            }
        }

        file.Close();