I am having an issue inserting unicode into an Oracle schema, I think the database is an Oracle 11g instance but am not certain at this point. I'm using python 2.6.1 on OS X 10.6.8 (this is the system verison of python) and am using the cx-Oracle driver module version 5.1 downloaded from sourceforge.net, built and installed to a virtualenv 1.6.1 instance with site packages visible. My script is as follows
import cx_Oracle
connection = cx_Oracle.connect(
"<name>/<password>@<host>/<service-name>"
)
cursor = connection.cursor()
result = cursor.execute(u"create table UNICODE_TEST (id NUMBER(6), text NCLOB not NULL)")
raw_text = open("test.txt",'r').read()
if isinstance(raw_text,str):
raw_text = raw_text.decode("utf_8")
statement = u"insert into UNICODE_TEST (id, text) values (1,'%s')" % raw_text
result = cursor.execute(statement)
I create a connection, create the cursor, execute a statment to create a test table with an id and text field of types NUMBER and NCLOB. I open a file containing what I know to be text encoded in UTF-8, decode the string to unicode. Create an insert statment in a unicode string and execute that statement, and the result is this error.
Traceback (most recent call last):
File "unicode-test.py", line 19, in <module>
result = cursor.execute(statement)
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2122' in position 170: ordinal not in range(128)
Something is trying to encode my statement as ASCII before inserting it into the Oracle schema. So I started hunting around looking to better understand how cx-Oracle handles unicode and found this in the HISTORY.txt of the cx-Oracle source I downloaded from sourceforge.net
Changes from 5.0.4 to 5.1
1) Remove support for UNICODE mode and permit Unicode to be passed through in everywhere a string may be passed in. This means that strings will be passed through to Oracle using the value of the NLS_LANG environment variable in Python 3.x as well. Doing this eliminated a bunch of problems that were discovered by using UNICODE mode and also removed an unnecessary restriction in Python 2.x that Unicode could not be used in connect strings or SQL statements, for example. ...
My assumption is that the NLS_LANG environment variable is set to 'ascii' or some equivalent, so I try setting NLS_LANG to 'AL32UTF8' which I believe is the correct value for unicode, and set the new value before creating my connection.
os.environ["NLS_LANG"] = "AL32UTF8"
connection = cx_Oracle.connect(
"<user>/<password>@<host>/<service-name>"
)
cursor = connection.cursor()
...
But I get this error.
Traceback (most recent call last):
File "unicode-test.py", line 11, in <module>
"<user>/<password>@<host>/<service-name>"
cx_Oracle.DatabaseError: ORA-12705: Cannot access NLS data files or invalid environment specified
So it looks like I cannot tamper with the NLS_LANG value.
Here are my questions as of now. Am I missing something simple like an incorrect column type? Is the problem with the cx-Oracle driver? Do I need to set the "WITH_UNICODE" environment variable when building the cx-Oracle module and how would I do that? Is the issue with the Oracle instance? I have little experience with Oracle and have never worked with Oracle and python together. I've spend two days working on this issue and would like a better understanding of what the issue is before I go to the DBA group with.
Thanks,