A recent project called for importing data into an Oracle database. The program that will do this is a C# .Net 3.5 app and I'm using the Oracle.DataAccess connection library to handle the actual inserting.
I ran into a problem where I'd receive this error message when inserting a particular field:
ORA-12899 Value too large for column X
I used Field.Substring(0, MaxLength);
but still got the error (though not for every record).
Finally I saw what should have been obvious, my string was in ANSI and the field was UTF8. Its length is defined in bytes, not characters.
This gets me to my question. What is the best way to trim my string to fix the MaxLength?
My substring code works by character length. Is there simple C# function that can trim a UT8 string intelligently by byte length (ie not hack off half a character) ?
Following Oren Trutner's comment here are two more solutions to the problem:
here we count the number of bytes to remove from the end of the string according to each character at the end of the string, so we don't evaluate the entire string in every iteration.
And an even more efficient(and maintainable) solution: get the string from the bytes array according to desired length and cut the last character because it might be corrupted
The only downside with the second solution is that we might cut a perfectly fine last character, but we are already cutting the string, so it might fit with the requirements.
Thanks to Shhade who thought about the second solution
If a UTF-8 byte has a zero-valued high order bit, it's the beginning of a character. If its high order bit is 1, it's in the 'middle' of a character. The ability to detect the beginning of a character was an explicit design goal of UTF-8.
Check out the Description section of the wikipedia article for more detail.