Bulk insert, SQL Server 2000, unix linebreaks

2019-01-21 22:11发布

问题:

I am trying to insert a .csv file into a database with unix linebreaks. The command I am running is:

BULK INSERT table_name
FROM 'C:\file.csv' 
WITH 
( 
    FIELDTERMINATOR = ',', 
    ROWTERMINATOR = '\n' 
) 

If I convert the file into Windows format the load works, but I don't want to do this extra step if it can be avoided. Any ideas?

回答1:

I felt compelled to contribute as I was having the same issue, and I need to read 2 UNIX files from SAP at least a couple of times a day. Therefore, instead of using unix2dos, I needed something with less manual intervention and more automatic via programming.

As noted, the Char(10) works within the sql string. I didn't want to use an sql string, and so I used ''''+Char(10)+'''', but for some reason, this didn't compile.

What did work very slick was: with (ROWTERMINATOR = '0x0a')

Problem solved with Hex!

Hope this helps someone.



回答2:

Thanks to all who have answered but I found my preferred solution.

When you tell SQL Server ROWTERMINATOR='\n' it interprets this as meaning the default row terminator under Windows which is actually "\r\n" (using C/C++ notation). If your row terminator is really just "\n" you will have to use the dynamic SQL shown below.

DECLARE @bulk_cmd varchar(1000)
SET @bulk_cmd = 'BULK INSERT table_name
FROM ''C:\file.csv''
WITH (FIELDTERMINATOR = '','', ROWTERMINATOR = '''+CHAR(10)+''')'
EXEC (@bulk_cmd)

Why you can't say BULK INSERT ...(ROWTERMINATOR = CHAR(10)) is beyond me. It doesn't look like you can evaluate any expressions in the WITH section of the command.

What the above does is create a string of the command and execute that. Neatly sidestepping the need to create an additional file or go through extra steps.



回答3:

I confirm that the syntax

ROWTERMINATOR = '''+CHAR(10)+'''

works when used with an EXEC command.

If you have multiple ROWTERMINATOR characters (e.g. a pipe and a unix linefeed) then the syntax for this is:

ROWTERMINATOR = '''+CHAR(124)+''+CHAR(10)+'''


回答4:

It's a bit more complicated than that! When you tell SQL Server ROWTERMINATOR='\n' it interprets this as meaning the default row terminator under Windows which is actually "\r\n" (using C/C++ notation). If your row terminator is really just "\n" you will have to use the dynamic SQL shown above. I have just spent the best part of an hour figuring out why \n doesn't really mean \n when used with BULK INSERT!



回答5:

One option would be to use bcp, and set up a control file with '\n' as the line break character.

Although you've indicated that you would prefer not to, another option would be to use unix2dos to pre-process the file into one with '\r\n' line breaks.

Finally, you can use the FORMATFILE option on BULK INSERT. This will use a bcp control file to specify the import format.



回答6:

Looks to me there are two general avenues that can be taken: some alternate way to read the CSV in the SQL script or convert the CSV beforehand with any of the numerous ways you can do that (bcp, unix2dos, if it is a one-time king of a thing, you can probably even use your code editor to fix the file for you).

But you will have to have an extra step!

If this SQL is launched from a program, you might want to convert the line endings in that program. In that case and you decide to code the conversion yourself, here is what you need to watch out for: 1. The line ending might be \n 2. or \r\n 3. or even \r (Mac!) 4. good grief, it could be that some lines have \r\n and others \n, any combination is possible unless you control where the CSV came from

OK, OK. Possibility 4 is farfetched. It happens in email, but that is another story.



回答7:

I would think "ROWTERMINATOR = '\n'" would work. I would suggest opening the file in a tool that shows "hidden characters" to make sure the line is being terminated like you think. I use notepad++ for things like this.



回答8:

It comes down to this. Unix uses LF (ctrl-J), MS-DOS/Windows uses CR/LF (ctrl-M/Ctrl-J).

When you use '\n' on Unix, it gets translated to a LF character. On MS-DOS/Windows it gets translated to CR/LF. When the your import runs on the Unix formatted file, it sees only a LF. Hence, its often easier to run the file through unix2dos first. But as you said in you original question, you don't want to do this (I'll assume there is a good reason why you can't).

Why can't you do:

(ROWTERMINATOR = CHAR(10))

Probably because when the SQL code is being parsed, it is not replacing the char(10) with the LF character (because it's already encased in single-quotes). Or perhaps its being interpreted as:

(ROWTERMINATOR =
     )

What happens when you echo out the contents of @bulk_cmd?