Is it better to set an index (primary or secondary) on a varchar(x)
or varbinary(x)
column?
I wasn't sure it even mattered. However what I see online (Googled - varchar vs varbinary) is that varchar
is almost dead or being pushed to the way side. So is this better to index or something? Could it be the type of index?
Excellent scenario: Indexing email addresses ([edit] encrypted byte array {varbinary} or string equivalent {varchar})
Thanks
Answer?
It seems that indexes on varbinary
is the worst thing ever. Am I reading this right?
https://social.msdn.microsoft.com/Forums/sqlserver/en-US/35b61bb0-1fa8-4a2f-a9fb-729a1874dcf8/clustered-index-on-a-varbinary-column
It is better to create an index on varchar than varbinary. Varbinary is suitable for blobs but you can store strings in varbinary also. Such blobs are complementary to your actual data. Your own research lead to that conclusion also.
An email address can be entered by user in variety of formats - abc@xyz.com or Abc@Xyz.com etc. It is easier to store/extract such information in/from varchar field. Joe Enos is absolutely right that binary comparisons will be case-sensitive (comparing binary info) whereas varchar will be case-insensitive assuming that's how you have set up your DB and column collation. With varbinary, you'll also have to be careful about padding.
Varchar is alive and healthy. When you index varchar(100), try to use a non-clustered index. My general preference is to use a surrogate key in most situations as clustered index.
The correct datatype to use is dictated by the contents of the column.
while encrypting an email address I have a byte array
This is binary data and should be stored as varbinary
(except if the encryption routine outputs a fixed length array then use binary
).
casting the varbinary
to varchar
would be of no benefit at all when it comes to storage (everything is stored as binary anyway) and would be likely to cause incorrect results.
The following all return "yes" in my default collation
SELECT
CASE
WHEN cast(0xE6 AS VARCHAR(20)) = cast(0x6165 AS VARCHAR(20))
THEN 'yes' else 'no'
END,
CASE
WHEN cast(0xcc0000 AS VARCHAR(20)) = cast(0xcc2020 AS VARCHAR(20))
THEN 'yes' else 'no'
END,
CASE
WHEN cast(0x202020 AS VARCHAR(20)) = cast(0x AS VARCHAR(20))
THEN 'yes' else 'no'
END
Adding a COLLATE LATIN1_GENERAL_BIN
alters the result of the first two but not the third.
I would expect looking up a varbinary(n)
column to be faster than looking up a varchar(n)
column generally as the comparison routines are simpler though perhaps not much in it for a varchar column with a binary collation.