I am importing a source CSV file, I don't know the source encoding and I can only see either �
(ANSI encoding) or �
(UTF8-without-BOM encoding) when I open a the file with Notepad++
(related question).
This file has been imported to the database mssql-2008
using bulk insert
:
DECLARE @bulkinsert NVARCHAR(2000)
SET @bulkinsert =
N'BULK INSERT #TempData FROM ''' +
@FilePath +
N''' WITH (FIRSTROW = 2,FIELDTERMINATOR = ''","'',ROWTERMINATOR =''\n'')'
EXEC sp_executesql @bulkinsert
This is then copied to the regular table1
from #tempData
in a column1 (varchar()
). Now when I look into this table1
I see some ?
in place of those characters.
I have tried to cast
to nvarchar()
but it does not help.
when I digged into what those characters really are with support of the link we download at same time, I saw that the characters were é
,ä
,å
and so on.
I would use replace
to fix the data but I need to make some ugly codes and look into individual pattern of words and replace, so seems difficult.
database/table collation: SQL_Latin1_General_CP1_CI_AS
column1(Varchar(80))
Can I change these characters to English-like characters or the original characters instead of ?
marks.
I have looked at Collation and Unicode Support which did not help me. I understood what it means about encoding but did not supply me with what to do. I have looked into most of the posts here in stackoverflow yes there are some posts about it but did not match my search.
I am unable to figure out where the problem lies.
It was answered in the comment. Did you try it?
http://msdn.microsoft.com/en-us/library/ms189941.aspx
Option DATAFILETYPE ='widenative'
Based on comment from Esailiga did the text get truncated before or after the bulk import. I agree it sounds like the CSV file itself is single byte. Unicode requires option DATAFILETYPE ='widenative'. If the CSV file is single byte the is not magic translation back.
What is too bad is é is extended ASCII and supported with SQL char so more evidence the problem is at the CSV.
SELECT CAST('é' AS char(1))
notice this works as extended ASCII (<255)
Sounds like you need to go back to the source.
The ? in SQL is unknown. Same as � in notepad.
In my case I can fix the encoding problem with the CODEPAGE option:
BULK
INSERT #CSV
FROM 'D:\XY\xy.csv'
WITH
(
CODEPAGE = 'ACP',
DATAFILETYPE ='char',
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n',
FIRSTROW = 2
)
Possible values:
CODEPAGE = { 'ACP' | 'OEM' | 'RAW' | 'code_page' } ]
You can find more information about the option here:
BULK INSERT
I still cannot believe that after all these years Microsoft has not fixed this obvious bug. There should be no problem with èéêë etc because they are all ascii(<255). This quest is posed over and over again on many sites and the question has yet to be answered
My data is in a table in excel. having generated the insert into statements the table is parsed a 2nd time looking for asccii > 'z' and generating and update table set column statement to overwrite the imported data. Cumbersome but workable
I've done it! After all these years and we were all looking in the wrong place. No work needed no rewriting scripts...
The problem lies with SSMS... if you "New Query" by right-clicking on "Queries" you get to rename the file but not create it that is done for you...
But... if you "Ctrl+N" you get a new query window to edit but no file is created... So you save it yourself and choose encoding on the save button... towards the bottom of the list you'll find UTF-8(without signature) codepage 65001
And that is it...
script after script open a new query window with "ctrl+N" copy and paste from an existing query and save as directed above. And as if by magic it works
If like me you have tables in Excel... parse the table writing the output to the 1st column of a new workbook with 1 sheet in it and then saveas and choose utf-8 encoding
Speed things up with a template file containing a comment "-- utf-8" something like that. save it as utf-8 and use a file listing of *.sql pasted into excel to concatenate a list of
=concatenate("ren templatefile.txt ", char(34), a1, char(34))
in b1 and drop it down
After all these years of manual solutions I am literally sweating with excitement at the discovery. Thank you for getting me so upset