SSIS OPENROWSET query flat file

2019-07-27 07:23发布

I currently have a variable name called InvoiceFileName that is creating .csv files through a foreach loop. A list of .csv is then outputted to a folder.
I will then need to query off of each .csv file to select the header and the first row of data for each .csv. I believe I need to use the OPENROWSET to query off of the .csv. I have 2 questions.

  1. What is the syntax to query off of the variable name InvoiceFileName.
  2. Is it possible to select the header field and first row of data OPENROWSET without inserting into a table.

Below is a simple OPENROWSET that only provides the header of the file.

SELECT 
top 1 *
FROM OPENROWSET(BULK N'\\myservername\f$\reports\Invoices\CokeFiles\54ASBSd.csv', SINGLE_CLOB) AS Report 

2条回答
欢心
2楼-- · 2019-07-27 07:45

Use a Foreach Loop container to query all files in a folder. You can use wildcards for the file name, or user the variables in your DTS to set the properties of the components.

Inside the loop container you place a Data Flow Task with your source file connection, your transformations, and your destination.

You can modify the file names and paths of all these objects by setting their properties to variables in your DTS.

With an Expresion Task inside the loop, you can change the path of the CSV file connection.

Foreach Loop Editor

查看更多
Juvenile、少年°
3楼-- · 2019-07-27 08:11

What kind of privs do you have on the database? If you have or can get slightly elevated privs, you can use BULK INSERT and xp_cmdShell to accomplish this, but like @scsimon said, you will have to use dynamic sql. Here's a quick example:

-----------------------------------------------------------------------------------------------------
-- Set up your variables
-----------------------------------------------------------------------------------------------------
DECLARE 
    @folderPath AS VARCHAR(100) = '\\some\folder\path\here\',
    @cmd AS VARCHAR(150), -- Will populate this with a command to get a list of files in a directory
    @InvoiceFileName AS VARCHAR(100), -- Will be used in cursor loop
    @targetTable AS VARCHAR(50) = 'SomeTable',
    @fieldTerminator AS CHAR(1) = ',',
    @rowTerminator AS CHAR(2) = '\n'
-----------------------------------------------------------------------------------------------------
-- Create a temp table to store the file names
-----------------------------------------------------------------------------------------------------
IF OBJECT_ID('tempdb..#FILE_LIST') IS NOT NULL
    DROP TABLE #FILE_LIST
--
CREATE TABLE #FILE_LIST(FILE_NAME VARCHAR(255))

-----------------------------------------------------------------------------------------------------
-- Get a list of the files and store them in the temp table:
-- NOTE: this DOES require elevated permissions
-----------------------------------------------------------------------------------------------------
SET @cmd = 'dir "' + @folderPath + '" /b'
--
INSERT INTO #FILE_LIST(FILE_NAME)
EXEC Master..xp_cmdShell @cmd

--------------------------------------------------------------------------------
-- Here we remove any null values
--------------------------------------------------------------------------------
DELETE #FILE_LIST WHERE FILE_NAME IS NULL

-----------------------------------------------------------------------------------------------------
-- Set up our cursor and loop through the files 
-----------------------------------------------------------------------------------------------------
DECLARE c1 CURSOR FOR SELECT FILE_NAME FROM #FILE_LIST
OPEN c1
FETCH NEXT FROM c1 INTO @InvoiceFileName
WHILE @@FETCH_STATUS <> -1
    BEGIN -- Begin WHILE loop
        BEGIN TRY
            -- Bulk insert won't take a variable name, so dynamically generate the 
            --  SQL statement and execute it instead:
            SET @sql = 'BULK INSERT ' + @targetTable + ' FROM ''' + @InvoiceFileName + ''' '
                + '     WITH ( 
                        FIELDTERMINATOR = ''' + @fieldTerminator + ''', 
                        ROWTERMINATOR = ''' + @rowTerminator + ''', 
                        FIRSTROW = 1,
                        LASTROW = 2
                    ) '
            EXEC (@sql)
        END TRY
        BEGIN CATCH
            -- Handle errors here
        END CATCH
        -- Continue your loop
        FETCH NEXT FROM c1 INTO @path,@filename
    END -- End WHILE loop

-- Do what you need to do here with the data in your target table

A few disclaimers:

  1. I have not tested this code. Only copied from a slightly more complex proc I've used in the past that works for exactly this kind of scenario.
  2. You will need elevated privs for BULK INSERT and xp_cmdShell.
  3. I know people frown on using xp_cmdShell (and for good reason) but this is a quick and dirty solution making a lot of assumptions about what your environment is like.
  4. This is assuming you're not grabbing the data as you get each file in your variable. If you are, you can skip the first part of this code.
  5. This code also assumes you are doing your own error handling in places other than the one try/catch block you see. I've omitted a lot of that for simplicity.

For doing this through SSIS, ideally you'd probably need to use a format file for the bulk operation, but you'd have to have consistently formatted files and remove the SINGLE_CLOB option as well. A really hacky and non-ideal way to do this would be to do something like this:

Let's say your file contains this data:

Col1,Col2,Col3,Col4
Here's,The,First,Line
Here's,The,Second,Line
Here's,The,Third,Line
Here's,The,Fourth,Line

Then you could basically just parse the data doing something like this:

SELECT SUBSTRING(OnlyColumn, 0, CHARINDEX(CHAR(10), OnlyColumn, CHARINDEX(CHAR(10), OnlyColumn, 0)+1) )
FROM OPENROWSET(BULK '\\location\of\myFile.csv', SINGLE_CLOB) AS Report (OnlyColumn)

And your result would be this:

Col1,Col2,Col3,Col4  Here's,The,First,Line 

This is obviously dependent on your line endings being consistent, but if you want the results in a single column and single row (as is the behavior of the bulk operation with the SINGLE_CLOB option), that should get you what you need.

You can take a look at the solution on this SO post for info on how to pass the SSIS variable value as a parameter to your query.

查看更多
登录 后发表回答