I have an Oracle database that stores some data values in Simplified Chinese. I have created an ASP.net MVC C# webpage that is supposed to display this information. I am using a OdbcConnection
in order to retrieve the data, however when I run my da.Fill(t)
command the values return as "?"
OdbcCommand cmd = new OdbcCommand();
cmd.CommandText = select;
OdbcConnection SqlConn = new OdbcConnection("Driver={Oracle in instantclient_11_2};Dbq=Database;Uid=Username;pwd=password;");
DataTable t = new DataTable();
cmd.Connection = SqlConn;
SqlConn.Open();
OdbcDataAdapter da = new OdbcDataAdapter(cmd);
SqlConn.Close();
da.Fill(t);
return t;
t
has the data but everything that is supposed to be the Chinese characters is just a series of "?????"
Problems with character set are quite common, let me try to give some general notes.
In principle you have to consider four different character set settings.
1 and 2:
NLS_CHARACTERSET
andNLS_NCHAR_CHARACTERSET
Example:
AL32UTF8
They are defined only on your database, you can interrogate them with
These settings define which characters (in which format) can be stored in your database - no more, no less. It requires some effort (see Character Set Migration and/or Oracle Database Migration Assistant for Unicode) if you have to change it on existing database.
3:
NLS_LANG
Example:
AMERICAN_AMERICA.AL32UTF8
This value is defined only on your client. NLS_LANG has nothing to do with the ability to store characters in a database. It is used to let Oracle know what character set you are using on the client side. When you set NLS_LANG value (for example to AL32UTF8) then you just tell the Oracle database "my client uses character set AL32UTF8" - it does not necessarily mean that your client is really using AL32UTF8! (see below #4)
NLS_LANG can be defined by environment variable
NLS_LANG
or by Windows Registry atHKLM\SOFTWARE\Wow6432Node\ORACLE\KEY_%ORACLE_HOME_NAME%\NLS_LANG
(for 32 bit), resp.HKLM\SOFTWARE\ORACLE\KEY_%ORACLE_HOME_NAME%\NLS_LANG
(for 64 bit). Depending on your application there might be other ways to specify NLS_LANG, but let's stick to the basics. If NLS_LANG value is not provided then Oracle defaults it toAMERICAN_AMERICA.US7ASCII
Format of NLS_LANG is
NLS_LANG=language_territory.charset
. The {charset} part of NLS_LANG is not shown in any system table or view. All components of the NLS_LANG definition are optional, so following definitions are all valid:NLS_LANG=.WE8ISO8859P1
,NLS_LANG=_GERMANY
,NLS_LANG=AMERICAN
,NLS_LANG=ITALIAN_.WE8MSWIN1252
,NLS_LANG=_BELGIUM.US7ASCII
.As stated above the {charset} part of
NLS_LANG
is not available in database at any system table/view or any function. Strictly speaking this is true, however you can run this query:It should return character set from your current
NLS_LANG
setting - however based on my experience the value is often NULL orUnknown
, i.e. not reliable.Find more very useful information here: NLS_LANG FAQ
Note, some technologies do not utilize
NLS_LANG
, settings there do not have any effect, for example:ODP.NET Managed Driver is not
NLS_LANG
sensitive. It is only .NET locale sensitive. (see Data Provider for .NET Developer's Guide)OraOLEDB (from Oracle) always use UTF-16 (see OraOLEDB Provider Specific Features)
Java based JDBC (for example SQL Developer) has its own methods to deal with character sets (see Database JDBC Developer's Guide - Globalization Support for further details)
4: The "real" character set of your terminal, your application or the encoding of
.sql
filesExample:
UTF-8
If you work on a Windows terminal (i.e. with SQL*plus) you can interrogate the code page with command
chcp
, on Unix/Linux the equivalent islocale charmap
orecho $LANG
. You can get a list of all Windows code pages identifiers from here: Code Page Identifiers. Note, for UTF-8 (chcp 65001
) there are some issues, see this discussion.If you work with
.sql
files and an editor like TOAD or SQL-Developer you have to check the save options. Usually you can choose values likeUTF-8
,ANSI
,ISO-8859-1
, etc.ANSI
means the Windows ANSI codepage, typicallyCP1252
, you can check in your Registry atHKLM\SYSTEM\ControlSet001\Control\Nls\CodePage\ACP
or here: National Language Support (NLS) API Reference[Microsoft removed this reference, take it form web-archive National Language Support (NLS) API Reference]
How to set all these values?
The most important point is to match
NLS_LANG
and your "real" character set of your terminal, resp. application or the encoding of your.sql
filesSome common pairs are:
CP850 ->
WE8PC850
CP1252 or ANSI (in case of "Western" PC) ->
WE8MSWIN1252
ISO-8859-1 ->
WE8ISO8859P1
ISO-8859-15 ->
WE8ISO8859P15
UTF-8 ->
AL32UTF8
Or run this query to get some more:
Some technologies make you life easier, e.g. ODP.NET (unmanged driver) or ODBC driver from Oracle automatically inherits the character set from
NLS_LANG
value, so condition from above is always true.Is it required to set client NLS_LANG value equal to database
NLS_CHARACTERSET
value?No, not necessarily! For example, if you have the database character set
NLS_CHARACTERSET=AL32UTF8
and the client character setNLS_LANG=.ZHS32GB18030
then it will work without any problem (provided your client really uses GB18030), although these character sets are completely different. GB18030 is a character set commonly used for Chinese, likeUTF-8
it supports all Unicode characters.If you have, for example
NLS_CHARACTERSET=AL32UTF8
andNLS_LANG=.WE8ISO8859P1
it will also work (again, provided your client really uses ISO-8859-P1). However, the database may store characters which your client is not able to display, instead the client will display a placeholder (e.g.¿
).Anyway, it is beneficial to have matching NLS_LANG and NLS_CHARACTERSET values, if suitable. If they are equal you can be sure that any character which may be stored in database can also be displayed and any character you enter in your terminal or write in your .sql file can also be stored in database and is not substituted by placeholder.
Supplement
So many times you can read advise like "The NLS_LANG character set must be the same as your database character set" (also here on SO). This is simply not true and a popular myth!
Here is the proof:
Both, client and database character sets are
AL32UTF8
, however the characters do not match. The reason is, mycmd.exe
and thus also SQL*Plus use Windows CP1252. Therefore I must set NLS_LANG accordingly:Also consider this example:
You would need to set two different values for
NLS_LANG
for a single statement - which is not possible.