I have written the following SQL CLR
function in order to hash string values larger then 8000 bytes (the limit of input value of the T-SQL
built-it HASHBYTES
function):
[SqlFunction(DataAccess = DataAccessKind.None, IsDeterministic = true)]
public static SqlBinary HashBytes(SqlString algorithm, SqlString value)
{
HashAlgorithm algorithmType = HashAlgorithm.Create(algorithm.Value);
if (algorithmType == null || value.IsNull)
{
return new SqlBinary();
}
else
{
byte[] bytes = Encoding.UTF8.GetBytes(value.Value);
return new SqlBinary(algorithmType.ComputeHash(bytes));
}
}
It is working fine for Latin strings. For example, the following hashes are the same:
SELECT dbo.fn_Utils_GetHashBytes ('MD5', 'test'); -- 0x098F6BCD4621D373CADE4E832627B4F6
SELECT HASHBYTES ('MD5', 'test'); -- 0x098F6BCD4621D373CADE4E832627B4F6
The issue is it is not working with Cyrillic strings. For example:
SELECT dbo.fn_Utils_GetHashBytes ('MD5 ', N'даровете на влъхвите') -- NULL
SELECT HashBytes ('MD5 ',N'даровете на влъхвите') -- 0x838B1B625A6074B2BE55CDB7FCEA2832
SELECT dbo.fn_Utils_GetHashBytes ('SHA256', N'даровете на влъхвите') -- 0xA1D65374A0B954F8291E00BC3DD9DF655D8A4A6BF127CFB15BBE794D2A098844
SELECT HashBytes ('SHA2_256',N'даровете на влъхвите') -- 0x375F6993E0ECE1864336E565C8E14848F2A4BAFCF60BC0C8F5636101DD15B25A
I am getting NULL
for MD5
, although the code returns value if it is executed as console application. Could anyone tell what I am doing wrong?
Also, I've got the function from here and one of the comments says that:
Careful with CLR SP parameters being silently truncated to 8000 bytes - I had to tag the parameter with [SqlFacet(MaxSize = -1)] otherwise bytes after the 8000th would simply be ignored!
but I have tested this and it is working fine. For example, if I generate a hash of 8000 bytes string and a second hash of the same string plus one symbol, I get the hashes are different.
DECLARE @A VARCHAR(MAX) = '8000 bytes string...'
DECLARE @B VARCHAR(MAX) = @A + '1'
SELECT LEN(@A), LEN(@B)
SELECT IIF(dbo.fn_Utils_GetHashBytes ('MD5', @A + '1') = dbo.fn_Utils_GetHashBytes ('MD5', @B), 1, 0) -- 0
Should I worry about this?
For a detailed walk-through that explains why you are seeing the differences, please see my answer to the following Question:
TSQL md5 hash different to C# .NET md5
And for anyone who does not wish to compile and deploy this themselves, this function is available in the Free version of the SQL# library of SQLCLR functions, stored procedures, etc (which I am the creator of, but Util_Hash and Util_HashBinary, among many others, are free). There is one difference between what is shown in the Question the two Util_Hash* functions in SQL#: the function shown in the Question takes a
NVARCHAR
/SqlString
input parameter whereas the SQL# functions takeVARBINARY
/SqlBinary
input. The differences are:VARBINARY
input also works for binary source data (files, images, encrypted values, etc)VARBINARY
input does require an extra step of doing aCONVERT(VARBINARY(MAX), source_string)
in the call to the function, doing so preserves whatever Code Page is being used forVARCHAR
data. While not used that often, this can be handy when working with non-Unicode data.Regarding the warning from the other post of:
and yet you not experiencing the same thing: this due to changes in how SSDT generates the T-SQL wrapper objects for SQLCLR objects. In earlier versions (especially those that came with Visual Studio prior to VS 2013), the default behavior was to use
NVARCHAR(MAX)
forSqlChars
, andNVARCHAR(4000)
forSqlString
. But then at some point (I don't want to say as of VS 2013 since Visual Studio and SSDT are independent products even though VS comes with SSDT) the default was changed to useNVARCHAR(MAX)
for bothSqlChars
andSqlString
. The person who posted the warning (on 2013-02-06) must have been using an earlier version of SSDT. Still, it doesn't hurt (and is even a good practice) to be explicit and use[SqlFacet(MaxSize = -1)]
.Regarding the
if (algorithmType == null || value.IsNull)
logic: since either one beingNULL
should return aNULL
, you might be better off removing that logic and using theWITH RETURNS NULL ON NULL INPUT
option of theCREATE FUNCTION
statement. This option, however, is unfortunately not supported via any SSDT construct (i.e. noSqlFacet
for it). So in order to get this option enabled, you can create a Post-Deployment SQL script (which will automatically deploy after the main script), that issues anALTER FUNCTION
with the desired definition. And it wouldn't hurt to vote for my Connect suggestion to natively support this option: Implement OnNullCall property in SqlFunctionAttribute for RETURNS NULL ON NULL INPUT SQLCLR. On a practical level, the performance gain would be mainly seen in situation where you are passing in large values for the@value
parameter but yet somehow@algorithm
isNULL
and so you don't end up using the value of@value
. The reason to use theRETURNS NULL ON NULL INPUT
option is that when you call a SQLCLR function passing in eitherSqlString
orSqlBinary
, the entire value is copied over to the App Domain's memory. That is time, memory, and CPU you don't need to waste if you know ahead of time that you won't be using it :-). You might also see a gain, even if passing in smaller values, on functions that are called very frequently.Side note regarding the warning and your test: SQLCLR does not support
VARCHAR
, onlyNVARCHAR
. Hence, there never was a limit of 8000 since the limit would have been at 4000 characters had SSDT not automatically been usingNVARCHAR(MAX)
. So if there would have been a difference, it would have been seen first testing with only 4000 and 4001 characters.SQL Server has no concept of UTF-8. Use UCS-2 (UTF-16) or ASCII. The encoding used must match what you'd pass to
HASHBYTES
. You can easily see thatHASHBYTES
will hash differentlyVARCHAR
vs.NVARCHAR
:Best would be to change the SQLCLR function to accept the bytes, not a string, and deal with the cast to
VARBINARY
in the caller.FYI SQL Server 2016 has lifted the 8000 bytes restriction on
HASHBYTES
: