fetching data from multiple file and loading it in

2019-03-03 17:49发布

问题:

I have a source folder which contains 4 csv files with different no of columns in each of the file. I need to fetch only 3 columns(metadata same this 3 columns in all the 4 files) from each csv and load the columns inside Raw Destination from all the files avaiable in source folder. And Raw destination Output file name has to be like wht the inputfilename we are fetching + time stamp.

And at next level, i need to fetch this output raw as raw source and insert this records into oledb destination . and the destination table also has to be in dynamic.

for example i have 4 csv files called, test1.csv(10 columns). test2.csv(8), test3.csv(6), test4.csv(10) along with time stamps.

all this 4 files has columns position_id, asofdate, sumassured in common, now i want to load only these 3 columns to raw destination. If i load test1.csv then my raw destination outputfile name has to be RW_test1_20120119_222222.RW. similalrly if i load second file its filename as raw destination output..

Thanks

Satish

回答1:

As always, decompose your problems until you've got it into a something you can manage.

Processing CSVs via queries

Following the two questions and answers below will result in a package with an OLEDB Connection Manager configured to operate on CSVs in the folder @[User::InputFolder]. 3 variables CurrentFileName, InputFolder and Query have been defined with an expression set on Query. The expression for your @[User::Query] would look like "SELECT position_id, asofdate, sumassured FROM " + @[User::CurrentFileName]

Reference answers

  • SSIS FlatFile Acces via Jet

  • SSIS Task for inconsistent column count import?

At this point, your package should resemble the center piece below. Verify you can correctly enumerate all of the CSVs in the folder and the OLEDB query piece works.

RAW files

I'm not an expert on RAW file usage so there may be better ways of interacting with them. This will use the fourth variable, RawFileName. Set an expression on it like @[User::InputFolder] + "RawFile.raw" which would result in the file being written to C:\ssisdata\so\satishkumar\RawFile.raw

My general approach is to have a dataflow with a script task that sends no rows into a RAW File Destination.

Configure your destination as

  • Access mode: File name from variable
  • Variable name: User::RawFileName
  • Write option: Create Always

Process CSVs

The concept here is to append all the data into the RAW file that was created in the initial step.

Your source should already be configured as

  • OLE DB connection manager: FlatFile
  • Data access mode: SQL command from variable
  • Variable name: User::Query

Configure your destination as

  • Access mode: File name from variable
  • Variable name: User::RawFileName
  • Write option: Append

Extract from RAW

At this point, the foreach enumerator has completed and all the data has been loaded into the staging file. Now it is time to consume that and send data on to the destination.

Drag a Raw File Source Transformation onto your data flow. Unsurprisingly, you will configure as

  • Access mode: File name from variable
  • Variable name: User::RawFileName

Instead of Simulate destination, wire it up to the correct data destination.

Caveat

Be careful when using an expression with GETDATE/GETUTCDATE to define filenames as they are constantly evaluated. In 2005, we had used FileName_HHMMSS and had issues because processing didn't complete in the same second between the creation of a file and the next task that consumed the file. Instead, I have had better success using a dynamic but fixed starting point and generally, that is the system variable, StartTime @[System::StartTime]



回答2:

You can use ForEach Loop Container on the Control Flow Diagram to iterate txt and csv files.



标签: ssis