How do you import UTF-8 flat files into SQL Server

2020-04-03 01:52发布

I have a bunch of UTF-8 encoded flat files that need to be imported into a SQL Server 2008 R2 database. Bulk inserts are not able to identify the diameters nor seems to accept UTF-8.

I understand that there is a number of articles on how SQL Server 2008 deals with UTF-8 encoding, but I'm sort of looking for any updated answers as most of those articles are old.

Is there anything I can to do in order to get these flat files into the database either by converting them before an insert or a process to run during the insert?

I want to stay away from manually converting each one. Furthermore, SSIS packages that I've attempted to create can read and separate the data. It just can't move the data it seems. :(

The flat files are generated by Java. Converting the java environment from UTF-8 to any other encoding has been unsuccessful.

NOTE

I have no intention of storing UTF-8 data. My delimiter is coming in funky because it's UTF-8. SQL Server cannot read the characters when separating the columns and rows. That's it.

4条回答
贪生不怕死
2楼-- · 2020-04-03 02:23
  1. convert your data file to UTF-16 Little Endian (exactly Little Endian)
  2. use bcp with -w option.
查看更多
啃猪蹄的小仙女
3楼-- · 2020-04-03 02:33

Just for reference, if someone google it, and falls here like me.


I've tried the accepted answer a dozen times, with no success. In my case, my data file was a .csv flat file, which had a lot of accents characters/letters, like ç é ã á.

I also noted that no matter what encoding I choose, the import was made using the 1251 (ANSI - Latin 1) encoding.

So, the solution was convert before import, my .csv file from UTF-8 to the very same 1251 (ANSI - Latin 1) encoding. I did the conversion using Notepad++.

After converting it, did the regular import (through SSMS Tasks -> "Import Data" wizard), selecting the 1251 (ANSI - Latin 1) encoding, and everything was imported correctly.


Environment:

SQL Server Web 2016

SQL Server Management Studio v17.9.1

Notepad++ v7.7.1


Also, this answers too the original OP's question:

Is there anything I can to do in order to get these flat files into the database either by converting them before an insert or a process to run during the insert?

查看更多
forever°为你锁心
4楼-- · 2020-04-03 02:38

Not true, you simply need to choose code page 65001

enter image description here

查看更多
我想做一个坏孩纸
5楼-- · 2020-04-03 02:47

Microsoft has always been crap regarding encoding, especially in SQL Server. Here is your solution.

查看更多
登录 后发表回答