How to loop only through files that don't exis

2019-04-26 16:27发布

问题:

I have a folder on the network with files and I'm moving the files from one folder to another folder. But I only want to move new files, so only copy over files that do not exist in the destination folder. How? I already have the for each loop container and a file system task. I'm using variables. Right now it copies all files from one folder to the next everytime the package is executed. Is there some sort of conditional tool that I can stick in there? I'm not really good at writing scripts so if that is the only solution I may need your help.

回答1:

Here is one possible option that you can achieve this using Foreach Loop Container, Script task and File system task. Following example shows how this can be done. The example was created using SSIS 2008 R2.

Step-by-step process:

  1. Create two folders named Source and Destination in the path C:\temp\ as shown in screenshot #1.

  2. Place a sample file named Sample_File_01.txt in the folder path C:\temp\Source\ and leave the other folder C:\temp\Destination\ empty. The SSIS package will copy files from the Source folder to Destination folder only if the file already doesn't exist. Refer screenshots #2 and #3.

  3. On the SSIS package, create 7 variables as shown in the screenshot #4. Set the variable DestinationFolder to the value C:\temp\Destination\. Set the variable SourceFolder to the value C:\temp\Source\. Set the variable FilePattern to the value *.*. You can change the values of these variables according to your requirements.

  4. Select the variable SourceFilePath and open the Properties window by pressing F4 button. Change the property EvaluateAsExpression to True and set the property Expression to the value @[User::SourceFolder] + @[User::FileName]. Refer screenshot #5.

  5. Select the variable DestinationFilePath and open the Properties window by pressing F4 button. Change the property EvaluateAsExpression to True and set the property Expression to the value @[User::DestinationFolder] + @[User::FileName]. Refer screenshot #6.

  6. On the SSIS package's Control Flow tab, place a Foreach Loop container and configure the properties of the container as shown in screenshots #7 and #8. Make sure that you select the radio button Name and extension on the Collection section.

  7. Within the Foreach Loop container, place a Script Task. Double-click on the Script task and click on the Edit Script button. Replace the Main() method inside the script task with the code given under the Script Task Code section. This code checks if the destination file already exists or not and then populates the boolean variable DoesFileExist accordingly.

  8. Within the Foreach Loop container, place a File System Task below the Script Task. Connect the Script task's success green arrow to the File System Task. Configure the File System Task as shown in screenshot #9.

  9. We need the File System Task to execute only if the file doesn't exist in the destination path. So, we need to change the connector between the Script Task and the File System Task. Right-click on the green connector and select Edit as shown in screenshot #10.

  10. Configure the Precedence Constraint as shown in screenshot #11. This checks if the variable DoesFileExist contains the value False, which means the file was not found in the destination.

  11. Once configured, the SSIS package should be like as shown in screenshot #12.

  12. Screenshot #13 shows the first package execution. During this execution, there were no files in the destination path C:\temp\Destination\. After the execution, the file Sample_File_01.txt has been copied from C:\temp\Source\ to C:\temp\Destination\. Refer screenshot #14.

  13. Screenshot #15 shows the second package execution. During this execution, no files were copied to the destination path C:\temp\Destination\. As you can notice, that the File System Task didn't execute because the Precedence constraint failed.

Hope that helps.

Script task code:

C# code that can be used only in SSIS 2008 and above.

public void Main()
{
    Variables varCollection = null;

    Dts.VariableDispenser.LockForRead("User::DestinationFilePath");
    Dts.VariableDispenser.LockForWrite("User::DoesFileExist");
    Dts.VariableDispenser.GetVariables(ref varCollection);

    varCollection["User::DoesFileExist"].Value = Convert.ToBoolean(System.IO.File.Exists(varCollection["User::DestinationFilePath"].Value.ToString()));

    Dts.TaskResult = (int)ScriptResults.Success;
}

Screenshot #1:

Screenshot #2:

Screenshot #3:

Screenshot #4:

Screenshot #5:

Screenshot #6:

Screenshot #7:

Screenshot #8:

Screenshot #9:

Screenshot #10:

Screenshot #11:

Screenshot #12:

Screenshot #13:

Screenshot #14:

Screenshot #15:



回答2:

Here's a potential solution with two assumptions:

  1. All files in the destination folder were moved by this package.
  2. The file names are unique.

You could add a file name logging step to the ETL.

Create a new log table in the database with a column for the file name. On each loop of the package write the file name to this table. Then you can use that as a simple lookup table that is used by some conditional logic in the ETL to check if the file name being processed matches an existing file.

Exactly how you implement the logging step & the conditional check (could be a conditional in data flow, done in script task, etc) will depend on the design of your package, but it should accomplish what you're looking for.