I currently store my csv formatted files on disk and then query them like this:
SELECT *
FROM OPENROWSET(BULK 'C:\myfile.csv',
FORMATFILE = 'C\format.fmt',
FIRSTROW = 2) AS rs
Where format.fmt are the defined format of the columns in the csv file. This works very well.
But I'm interested in storing the file in a SQL Server table instead of storing them at disk.
So when having a VARBINARY(MAX) datatype column. How do I query them?
If I have a table like:
CREATE TABLE FileTable
(
[FileName] NVARCHAR(256)
,[File] VARBINARY(MAX)
)
With one row 'myfile.csv', '0x427574696B3B44616....'
How to read that file content into a temporary table for example?
If you really need to work with varbinary data, you can just cast it back to nvarchar:
DECLARE @bin VARBINARY(MAX)
SET @bin = 0x5468697320697320612074657374
SELECT CAST(@bin as VARCHAR(MAX))
-- gives This is a test
Once you've got it into that format, you can use a split function to turn it into a table. Don't ask me why there isn't a built-in split function in SQL Server, given that it's such a screamingly obvious oversight, but there isn't. So create your own with the code below:
CREATE FUNCTION [dbo].[fn_splitDelimitedToTable] ( @delimiter varchar(3), @StringInput VARCHAR(8000) )
RETURNS @OutputTable TABLE ([String] VARCHAR(100), [Hierarchy] int )
AS
BEGIN
DECLARE @String VARCHAR(100)
DECLARE @row int = 0
WHILE LEN(@StringInput) > 0
BEGIN
SET @row = @row + 1
SET @String = LEFT(@StringInput,
ISNULL(NULLIF(CHARINDEX(@delimiter, @StringInput) - 1, -1),
LEN(@StringInput)))
SET @StringInput = SUBSTRING(@StringInput,
ISNULL(NULLIF(CHARINDEX(@delimiter, @StringInput), 0),
LEN(@StringInput)) + 1, LEN(@StringInput))
INSERT INTO @OutputTable ( [String], [Hierarchy] )
VALUES ( @String, @row )
END
RETURN
END
Put it all together:
select CAST('one,two,three' as VARBINARY)
-- gives 0x6F6E652C74776F2C7468726565
DECLARE @bin VARBINARY(MAX)
SET @bin = 0x6F6E652C74776F2C7468726565
select * from fn_splitDelimitedToTable(',', CAST(@bin as VARCHAR(MAX)))
gives this result:
string hierarchy
================
one 1
two 2
three 3
And of course, you can get the result into a temp table to work with if you so wish:
select * into #myTempTable
from fn_splitDelimitedToTable(',', CAST(@bin as VARCHAR(MAX)))
If you've got CSV data, why not just import it into the database?
You can also use BULK INSERT to do this as in this question.
Assuming you've created a table with the correct format to import the data into (e.g. 'MyImportTable') something like the following could be used:
BULK INSERT MyImportTable
FROM 'C:\myfile.csv'
WITH (DATAFILETYPE='char',
FIRSTROW = 2,
FORMATFILE = 'C\format.fmt');
EDIT 1:
With the data imported into the database, you can now query the table directly, and avoid having the CSV altogether like so:
SELECT *
FROM MyImportTable
With the reference to the original CSV no longer required you can delete/archive the original CSV.
EDIT 2:
If you've enabled xp_cmdshell, and you have the appropriate permissions, you can delete the file from SQL with the following:
xp_cmdshell 'del c:\myfile.csv'
Lastly, if you want to enable xp_cmdshell use the following:
exec sp_configure
go
exec sp_configure 'xp_cmdshell', 1
go
reconfigure
go