SQL Server read csv binary from table

2019-07-21 17:10发布

问题:

I currently store my csv formatted files on disk and then query them like this:

SELECT *
FROM OPENROWSET(BULK 'C:\myfile.csv',
    FORMATFILE = 'C\format.fmt',
    FIRSTROW = 2) AS rs

Where format.fmt are the defined format of the columns in the csv file. This works very well. But I'm interested in storing the file in a SQL Server table instead of storing them at disk. So when having a VARBINARY(MAX) datatype column. How do I query them?

If I have a table like:

CREATE TABLE FileTable
(
    [FileName] NVARCHAR(256)
    ,[File] VARBINARY(MAX)
)

With one row 'myfile.csv', '0x427574696B3B44616....'

How to read that file content into a temporary table for example?

回答1:

If you really need to work with varbinary data, you can just cast it back to nvarchar:

DECLARE @bin VARBINARY(MAX)
SET @bin = 0x5468697320697320612074657374

SELECT CAST(@bin as VARCHAR(MAX))
-- gives This is a test

Once you've got it into that format, you can use a split function to turn it into a table. Don't ask me why there isn't a built-in split function in SQL Server, given that it's such a screamingly obvious oversight, but there isn't. So create your own with the code below:

CREATE FUNCTION [dbo].[fn_splitDelimitedToTable] ( @delimiter varchar(3), @StringInput VARCHAR(8000) )
RETURNS @OutputTable TABLE ([String] VARCHAR(100), [Hierarchy] int )
AS
BEGIN

    DECLARE @String    VARCHAR(100)
    DECLARE @row int = 0

    WHILE LEN(@StringInput) > 0
    BEGIN
        SET @row = @row + 1
        SET @String      = LEFT(@StringInput, 
                                ISNULL(NULLIF(CHARINDEX(@delimiter, @StringInput) - 1, -1),
                                LEN(@StringInput)))
        SET @StringInput = SUBSTRING(@StringInput,
                                     ISNULL(NULLIF(CHARINDEX(@delimiter, @StringInput), 0),
                                     LEN(@StringInput)) + 1, LEN(@StringInput))

        INSERT INTO @OutputTable ( [String], [Hierarchy] )
        VALUES ( @String, @row )
    END

    RETURN
END

Put it all together:

select CAST('one,two,three' as VARBINARY)
-- gives 0x6F6E652C74776F2C7468726565

DECLARE @bin VARBINARY(MAX)
SET @bin = 0x6F6E652C74776F2C7468726565

select * from fn_splitDelimitedToTable(',', CAST(@bin as VARCHAR(MAX)))

gives this result:

string hierarchy
================
one    1
two    2
three  3

And of course, you can get the result into a temp table to work with if you so wish:

select * into #myTempTable
from fn_splitDelimitedToTable(',', CAST(@bin as VARCHAR(MAX)))


回答2:

If you've got CSV data, why not just import it into the database?

You can also use BULK INSERT to do this as in this question.

Assuming you've created a table with the correct format to import the data into (e.g. 'MyImportTable') something like the following could be used:

BULK INSERT MyImportTable
FROM 'C:\myfile.csv'
WITH (DATAFILETYPE='char',
      FIRSTROW = 2,
      FORMATFILE = 'C\format.fmt');

EDIT 1:

With the data imported into the database, you can now query the table directly, and avoid having the CSV altogether like so:

 SELECT *
   FROM MyImportTable

With the reference to the original CSV no longer required you can delete/archive the original CSV.

EDIT 2:

If you've enabled xp_cmdshell, and you have the appropriate permissions, you can delete the file from SQL with the following:

xp_cmdshell 'del c:\myfile.csv'

Lastly, if you want to enable xp_cmdshell use the following:

exec sp_configure
go
exec sp_configure 'xp_cmdshell', 1
go
reconfigure
go