I have some utf8 data which I would like to bulk insert (sql server 2005). I am using the CODEPAGE 65001:
BULK INSERT #bla
FROM 'D:\bla.txt'
WITH
(
CODEPAGE=65001,
FIELDTERMINATOR = '\t',
ROWTERMINATOR = '\n'
)
Unfortunatly strings like this:
Erdağı
end up being stored like this:
Erda??
Do I use the wrong code page? Is there anything else I can do?
Thanks.
Christian
According to this link, "SQL Server does not support code page 65001 (UTF-8 encoding)." At first, I thought this related only to 2008, but according to a Microsoft technical writer's response to a question on this link, "SQL Server never has supported code page 65001 (UTF-8 encoding)."
You may use c# to handle this problem:
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace UTF8toUCS2
{
class Program
{
static void Main(string[] args)
{
if (args.Length != 1)
{
Console.WriteLine("exampe: UTF8toUS2 [filepath]");
return;
}
var filename = args[0];
byte[] content = File.ReadAllBytes(filename);
byte[] newArray = new byte[content.Length + 3];
newArray[0] = (byte)0xEF;
newArray[1] = (byte)0xBB;
newArray[2] = (byte)0xBF;
Array.Copy(content, 0, newArray, 3, content.Length);
byte[] utcs2Bytes = System.Text.Encoding.Convert(System.Text.Encoding.UTF8, System.Text.Encoding.Unicode, newArray);
File.WriteAllBytes(filename, utcs2Bytes);
}
}
}