Situation:
varchar(20)
seems to truncate silently in Teradata and not to expand or complain when encountering strings larger than 20 characters long... This is a bit of a surprise as I expected either automatic expansion of the column to fit larger strings, say 30 characters, OR for an error to be thrown if a larger string were encountered. Silent truncation seems to get me the worst of all worlds...
Complication:
For my application (prototype analytics design) I don't know in advance how large will be the data I will be ingesting over the course of a few weeks. That seems to rule out using varchar(N), except for max
Questions:
So now I have a few choices, and am looking for some guidance:
Q1. User error? Am I misunderstanding a key concept about varchar(N)
?
If this is in fact how Teradata handles varchar
fields, then
Q2. why would anyone specify anything less than varchar(max)
especially when it is not clear in advance how many characters might need to be stored in the field.
Q3. Is there a different data type that permits flexible sizing of the string -- i.e. a true variable length character string?
If I recall, other SQL dialects implement varchar(n)
as a recommended initial size for the string but allow it to expand as needed to fit the maximum length of the data strings thrown in. Is there a similar data type in Teradata?
(Note: since I'm prototyping the tables, I am less concerned about performance efficiency at this point; more concerned about quick but safe designs that allow the prototype to progress.)
I am not familiar with any dialect of SQL that implements a varchar(n) that behaves as you suggest -- a recommended initial size and then letting it grow. This would apply to Oracle, SQL Server, MySQL, and Postgres. In all these databases, varchar(n) behaves pretty much as you see it behave in Teradata in SELECT statements with explicit casts. I don't believe any cause a truncation error when a longer string is placed into a shorter string.
As Branko notes in his comment, the behavior is different in data modification steps, where an implicit cast does cause an error.
I am not familiar with all the details of Teradata. In SQL Server, there is historically a world of difference between varchar(max) and varchar(8000). The former would be allocated on a separate data page, with the latter allocated on the same page as the data. (The rules have been modified in more recent versions so varchars can spill off the data page.)
In other words, there may be other considerations when using varchar(max), involving how the data is stored on pages, how indexes are built on them, and perhaps other considerations.
My suggestion is that you pick a reasonably large size, say 1000 or so, and let the application continue from there. If you want real flexibility, then use varchar(max). You should also investigate through Teradata documentation and/or technical contacts what the issues are with declaring very large strings.
Teradata works in two modes : Teradata (BT; .. ET;) and ANSI(commit;). They have list of differences and one of them you've met during development -- Teradata mode allows truncation of display data. On contrary - ANSI forbids such truncation, so, you'll see an error.
To get the idea, just use simple example:
create table check_exec_mode (str varchar(5))
;
select * from check_exec_mode
;
insert into check_exec_mode values ('123456')
;
If you configure connections of your teradata client (e.g., Teradata Studio Express) in TMODE(transaction mode)=TERA, then you'll get as a result one truncated row in the table ('12345').
Changing transaction mode to ANSI and executing insert statement , will lead you to the error "Right truncation of string data".