I maintain a PHP driven application with Oracle backend (OCI8 functions). The app is developed with Oracle 10g XE and deployed on whatever version the customer owns.
The application handles single-byte text (ISO-8859-15) and I've never had any problem while developing against the Western European edition of Oracle XE. However, I've recently installed the Universal edition and I'm having issues when inserting large strings with non-ASCII chars. This version sets NLS_CHARACTERSET = AL32UTF8
; since I my app uses WE8ISO8859P15
Oracle silently converts my input data from ISO-8859-15 to UTF-8 (which is fine). But it seems that certain size checks go wrong: a string with 1500 €
characters (1500 bytes in ISO-8889-15, 4500 bytes in UTF-8) appear to overflow a VARCHAR2(4000 CHAR)
column.
I've created this test table:
CREATE TABLE FOO (
FOO_ID NUMBER NOT NULL ENABLE,
DATA_BYTE VARCHAR2(4000 BYTE),
DATA_CHAR VARCHAR2(4000 CHAR),
CONSTRAINT FOO_PK PRIMARY KEY (FOO_ID)
);
The problem can be reproduced with this code:
<?php
$connection = oci_connect(DB_USER, DB_PASS, DB_CONN_STRING, 'WE8ISO8859P15');
if( !$connection ){
$e = oci_error();
die(htmlspecialchars($e['message']));
}
$id = 1;
$data = str_repeat('€', 1500);
$sql = 'INSERT INTO FOO (FOO_ID, DATA_CHAR) ' .
'VALUES (:id, :data)';
$res = oci_parse($connection, $sql);
if(!$res){
$e = oci_error();
die(htmlspecialchars($e['message']));
}
if(!oci_bind_by_name($res, ':id', $id)){
$e = oci_error();
die(htmlspecialchars($e['message']));
}
if(!oci_bind_by_name($res, ':data', $data)){
$e = oci_error();
die(htmlspecialchars($e['message']));
}
if(!oci_execute($res, OCI_COMMIT_ON_SUCCESS)){
$e = oci_error();
die(htmlspecialchars($e['message']));
}
... which triggers:
Warning: oci_execute(): ORA-01461: sólo puede enlazar un valor LONG para insertarlo en una columna LONG
It is the same error I get when I try to insert a 4001 char string. It doesn't happen if I insert xxx...
instead of €€€
and it doesn't happen if I save my script as UTF-8 and connect as such:
<?php
$connection = oci_connect(DB_USER, DB_PASS, DB_CONN_STRING, 'AL32UTF8');
[Update: My test was flawed. Using UTF-8 doesn't avoid ORA-01461]
How can I override this problem? The NLS_CHARACTERSET database parameter is not something I control and switching my app to UTF-8 is likely to cause other problems (almost all our customers have single byte databases).
This is probably not something that you can work around unless you want to use a CLOB instead of a VARCHAR2.
In Oracle, when you declare a column, the default is to use byte-length semantics. So a VARCHAR2(100), for example, allocates 100 bytes of storage. If you're using a single-byte character set like ISO 8859-1, every character requires 1 byte of storage, so this also allocates space for 100 characters. But if you are using a multi-byte character set like UFT-8, each character can require between 1 and 4 bytes of storage. Depending on the data, therefore, a VARCHAR2(100) may only be able to store 25 characters of data (English characters generally require 1 byte, European characters generally require 2 bytes, and Asian characters generally require 3 bytes).
You can tell Oracle to use character length semantics which is normally what I'd suggest when moving from an ISO-8859-1 database to a UTF-8 database. If you declare a column VARCHAR2(100 CHAR), Oracle will allocate space for 100 characters regardless of whether that ends up being 100 bytes or 400 bytes. You can also set the NLS_LENGTH_SEMANTICS parameter to CHAR to change the default (for new DDL) so that a VARCHAR2(100) allocates 100 characters of storage rather than 100 bytes.
Unfortunately for you, though, the limit on the size of an Oracle VARCHAR2 (in the context of the SQL engine rather than the PL/SQL engine) is 4000 bytes. So even if you declare a column VARCHAR2(4000 CHAR), you're still going to be limited to actually inserting 4000 bytes of data which may be as few as 1000 characters. For example, in a database using the AL32UTF8 character set, I can declare a column VARCHAR2(4000 CHAR) but inserting a character that requires 2 bytes of storage shows that I can't really insert 4000 characters of data
If you need to store 4000 characters of UTF-8 data, you'd need a data type that could handle 16000 bytes which would necessitate moving to a CLOB.