Can Sql Server BULK INSERT read from a named pipe/

2019-02-09 18:00发布

问题:

Is it possible for BULK INSERT/bcp to read from a named pipe, fifo-style?

That is, rather than reading from a real text file, can BULK INSERT/bcp be made to read from a named pipe which is on the write end of another process?

For example:

  1. create named pipe
  2. unzip file to named pipe
  3. read from named pipe with bcp or BULK INSERT

or:

  1. create 4 named pipes
  2. split 1 file into 4 streams, writing each stream to a separate named pipe
  3. read from 4 named pipes into 4 tables w/ bcp or BULK INSERT

The closest I've found was this fellow (site now unreachable), who managed to write to a named pipe w/ bcp, with a his own utility and usage like so:

start /MIN ZipPipe authors_pipe authors.txt.gz 9
bcp  pubs..authors out  \\.\pipe\authors_pipe -T -n

But he couldn't get the reverse to work.

So before I head off on a fool's errand, I'm wondering whether it's fundamentally possible to read from a named pipe w/ BULK INSERT or bcp. And if it is possible, how would one set it up? Would NamedPipeServerStream or something else in the .NET System.IO.Pipes namespace be adequate?

eg, an example using Powershell:

[reflection.Assembly]::LoadWithPartialName("system.core")
$pipe = New-Object system.IO.Pipes.NamedPipeServerStream("Bob")

And then....what?

回答1:

I have succeeded in getting BULK INSERT (but not BCP) to work correctly with named pipes on Windows 7 ans SQL Server 2008R2. There are some tricks.

First, I had to create two named pipe instances on two different threads, both with the same pipe name. SQL Server would open the first instance, read a few bytes from it, and close it, causing WriteFile to raise a PipeException in the first thread. SQL Server would then immediately reopen the named pipe, and stream in all of the data from it. If I didn't have a second thread sitting in the background ready to serve the data, SQL server would return an error before my first thread had time to recover from the PipeException.

Second, I had to write all of the data in a single call to WriteFile. I started with a loop where I wrote multiple batches to the pipe, but BULK INSERT only used the first batch that I wrote. It seems to do a non-blocking read, and treat any read that returns zero bytes as an end-of-file.

Third, an XML format file, if used, must be written to a regular file. I have not succeeded in getting SQL Server to read the format file from a pipe. I don't know if it can read a non-XML format file from a pipe.



回答2:

I'd comment on @DanMenes (thanks for the inspiration), but for reference purposes, I'm adding it as a separate answer.

I've worked out a solution in .NET which opens up a pipe (actually 2, first one gets destroyed like @DanMenes said), prepares streaming the data to it and then starts the BULK INSERT with an auto-generated format file.

The premise is that I can do stuff like

  var inMemoryData = new[] {
    new[] { "val1", "val2" },
    new[] { "val3", "val4" },
  };

  using (var importer = new Importer(SqlConnection, "MyTable", "Col1", "Col2"))
  {
    importer.Import(inMemoryData);
  }

I'll summarize the implementation of Importer:

1. Create the pipe

var stream = new NamedPipeServerStream(name, PipeDirection.Out, 2, PipeTransmissionMode.Byte, PipeOptions.Asynchronous);
stream.BeginWaitForConnection(OnConnection, this);

2. Accept connections

public void OnConnection(IAsyncResult asyncResult)
{
  Stream.EndWaitForConnection(asyncResult);

  var buffer = Encoding.UTF8.GetBytes(data);
  Stream.Write(buffer, 0, buffer.Length);
  Stream.Close();
}

3. Start BULK INSERT

var insertCommand = DbConnection.CreateCommand();
insertCommand.CommandText = "BULK INSERT [MyTable] FROM '\\.\pipe\mypipe' WITH (FORMATFILE='c:\path\to\formatfile')";
insertCommand.ExecuteNonQuery();

See the GitHub project for more details.

Note: I have yet to add performance tests to the project, but preliminary tests did show performance gains between 2x and 5x with respect to transactional INSERTs.



回答3:

Unfortunately, both SSIS flat file adaptors, BULK INSERT and BCP take an exclusive write lock on the file (even though it does not actually write to it). This is why this doesn't work.

I am not sure pipes can be set up to allow two exclusive locks on the same pipe without some serious hacking. You could detour it I suppose or hack into fltmgr.sys :)

As the other posters suggested, using the .NET API to do bulk or the OLEDB or ODBC interface instead is likely simpler, even though it means you have to write your own file parser.



回答4:

Does BCP accept STDIN? If so, you might want to try just piping it straight through without creating a named pipe...for instance:

gunzip authors.txt.gz | bcp schema.tablename