UTF8 problem sql server

2019-07-20 05:04发布

问题:

I have some utf8 data which I would like to bulk insert (sql server 2005). I am using the CODEPAGE 65001:

BULK INSERT #bla
FROM 'D:\bla.txt'  
WITH 
( 
    CODEPAGE=65001,
    FIELDTERMINATOR = '\t', 
    ROWTERMINATOR = '\n'
)

Unfortunatly strings like this:

Erdağı

end up being stored like this:

Erda??

Do I use the wrong code page? Is there anything else I can do?

Thanks.

Christian

回答1:

According to this link, "SQL Server does not support code page 65001 (UTF-8 encoding)." At first, I thought this related only to 2008, but according to a Microsoft technical writer's response to a question on this link, "SQL Server never has supported code page 65001 (UTF-8 encoding)."



回答2:

You may use c# to handle this problem:

using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace UTF8toUCS2
{
    class Program
    {
        static void Main(string[] args)
        {
            if (args.Length != 1) 
            {
                Console.WriteLine("exampe: UTF8toUS2 [filepath]");
                return;
            }

            var filename = args[0];

            byte[] content = File.ReadAllBytes(filename);

            byte[] newArray = new byte[content.Length + 3];

            newArray[0] = (byte)0xEF;
            newArray[1] = (byte)0xBB;
            newArray[2] = (byte)0xBF;

            Array.Copy(content, 0, newArray, 3, content.Length);

            byte[] utcs2Bytes = System.Text.Encoding.Convert(System.Text.Encoding.UTF8, System.Text.Encoding.Unicode, newArray);

            File.WriteAllBytes(filename, utcs2Bytes);
       }
    }
}