exporting table with blob and utf8 string fields f

2019-09-04 08:17发布

问题:

I have a table with binary(32), blob and varchar utf-8 fields. from one mysql server to another I export data via csv:

select * INTO OUTFILE '$tmp_fname'
                  FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '\"'
                  LINES TERMINATED BY '\\r\\n'
                    from mytable

and then

load data local infile '" . $mysqli->real_escape_string($glb) . "' ignore into table mytable_temp
        CHARACTER SET 'utf8'
          FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '\"'
                  LINES TERMINATED BY '\\n' 

I tried the same with bulk insert in MSSQL, and for simple types it works (I get another table with int and char(44) in it). But in this case I get import errors. Some detials: I need to make automated export-import - that's why I use csv, both servers can communicate only via http (php scripts). Tables have millions of rows.

So here are questions.

  1. How blob field data should be formated in csv so that MS SQL can import it?

  2. How can I export utf8 string for MS SQL? I tried convert(myfield using utf16), is it what I need?

Also I tried to export data in utf16 and specify DATAFILETYPE ='widechar' in bulk insert, but it throws an error on first int value. It can't actually read widechar?

回答1:

It's strange nobody from professionals knows an answer.

  1. blob and binary fields should be exported as HEX(field_name) and then imported to mssql as is.

By the way, the most flexible way is using format file, as having exact csv you see where quotes appear and where do not. format file description

  1. to export utf8 and other non-ansi strings from mysql you should use HEX( (convert(str_field_name using utf16le) )) - you get all bytes as they are - then bulk import to intermediate mssql table and then merge or insert to the target table converting to nvarchar: cast(source.str_field_name AS nvarchar(any-length-you-need)). I spend about an hour before realized that mssql needs exactly litle endian.

Don't try to 'select ... into outfile' with encoding utf16le, just leave it default, as everything we've got casting all strings to hex binary is pure ansi output. Bulk insert somehow refused to import widechar (utf16le) csv as well as utf16be. So maybe hex-bin solution is not that fast but it is universal.