What is the advantage of using varbinary over varc

2019-01-23 03:27发布

问题:

A while ago I asked a question about hierarchy/version number sorting in SQL Server. ( How Can I Sort A 'Version Number' Column Generically Using a SQL Server Query).

Among the answers that were submitted was this link to a TSQL Coding challenge with much the same puzzle.

In the SQL2000 solution the author demonstrated a two variations, one using and returning a varchar and the other varbinary. The author explains THAT he is doing this without explaining WHY.

So, my question is really, what main differences/advantages (if any) of the difference in approach? I.e. why use a varbinary instead of a varchar?

I've omitted posting the code, as its most elegantly summed up in the above article.

回答1:

I believe the expectation is that the varbinary data will generally consume fewer bytes (5), than the varchar one (10 or 11, I think) per portion of the original string, and so, for very large numbers of components, or comparisons to occur, it should be more efficient.

But I'd recommend that if you were looking to use either solution, that you implement both (they're quite short), and try some profiling against your real data (and query patterns), to see if there are practical differences (I wouldn't expect so).

(Crafty Steal): And as Martin points out, the binary comparisons will be more efficient, since it won't involve all of the code that's there to deal with collations. :-)



回答2:

If we use different collation for different varchar columns to store strings and use more than one such columns in a sql query, then the sql query may throw the error "Invalid mix of collations". ( For ex, if we want to compare two strings of incompatible collation or try to select data of different collation into a combined column).

But, that can be fixed if we specify "COLLATE" in the query. For ex :

 WHERE 'A' COLLATE latin1_general_ci = 'A' COLLATE latin1_general_ci 

But, this defeats any INDEX you may have.

To guard against the "Invalid mix of collations" errors, we can use varbinary.

varbinary uses less space than varchar if multi-byte collation is used for the varchar column. (binary strings don’t have character sets and collations. Binary strings are merely a sequence of byte values).

*** Btw, A character set is a set of symbols and encodings. A collation is a set of rules for comparing characters in a character set https://dev.mysql.com/doc/refman/5.7/en/charset-general.html

But, if you choose a single byte character set (for ex, latin1) instead of multi-byte character set (for ex, utf8 or ucs2), then the space requirement for both varbinary and varchar are same.

VARBINARY is better than VARCHAR if there is no validity checking. For example, if the default character set is UTF8 then this is illegal:

CREATE TABLE t9 (s1 VARCHAR(5));
INSERT INTO t9 VALUES (0xF4808283);

But, this is legal because character set doesn't matter:

CREATE TABLE t10 (s1 VARBINARY(5));
INSERT INTO t10 VALUES (0xF4808283);

So, VARCHAR compares characters using a "collation" and VARBINARY compare bytes. Most collations are "case insensitive", so upper case and lower case are considered equal. Since, varbinary dont use any collation, the search operations are always case sensitive in case of varbinary.