SQL Server 2008 and HashBytes

2019-01-17 22:29发布

问题:

I have quite a large nvarchar which I wish to pass to the HashBytes function. I get the error:

"String or binary would be truncated. Cannot insert the value NULL into column 'colname', tbale 'table'; column does not allow nulls. UPDATE fails. The statement has been terminated."

Being ever resourceful, I discovered this was due to the HashBytes function having a maximum limit of 8000 bytes. Further searching showed me a 'solution' where my large varchar would be divided and hashed seperately and then later combined with this user defined function:

function [dbo].[udfLargeHashTable] (@algorithm nvarchar(4), @InputDataString varchar(MAX))
RETURNS varbinary(MAX)
AS
BEGIN
DECLARE
    @Index int,
    @InputDataLength int,
    @ReturnSum varbinary(max),
    @InputData varbinary(max)

SET @ReturnSum = 0
SET @Index = 1
SET @InputData = convert(binary,@InputDataString)
SET @InputDataLength = DATALENGTH(@InputData)

WHILE @Index <= @InputDataLength
BEGIN
    SET @ReturnSum = @ReturnSum + HASHBYTES(@algorithm, SUBSTRING(@InputData, @Index, 8000))
    SET @Index = @Index + 8000
END
RETURN @ReturnSum
END

which I call with:

set @ReportDefinitionHash=convert(int,dbo.[udfLargeHashTable]('SHA1',@ReportDefinitionForLookup))

Where @ReportDefinitionHash is int, and @ReportDefinitionForLookup is the varchar

Passing a simple char like 'test' produces a different int with my UDF than a normal call to HashBytes would produce.

Any advice on this issue?

回答1:

Just use this function (taken from Hashing large data strings with a User Defined Function):

create function dbo.fn_hashbytesMAX
    ( @string  nvarchar(max)
    , @Algo    varchar(10)
    )
    returns varbinary(20)
as
/************************************************************
*
*    Author:        Brandon Galderisi
*    Last modified: 15-SEP-2009 (by Denis)
*    Purpose:       uses the system function hashbytes as well
*                   as sys.fn_varbintohexstr to split an 
*                   nvarchar(max) string and hash in 8000 byte 
*                   chunks hashing each 8000 byte chunk,,
*                   getting the 40 byte output, streaming each 
*                   40 byte output into a string then hashing 
*                   that string.
*
*************************************************************/
begin
     declare    @concat       nvarchar(max)
               ,@NumHash      int
               ,@HASH         varbinary(20)
     set @NumHash = ceiling((datalength(@string)/2)/(4000.0))
    /* HashBytes only supports 8000 bytes so split the string if it is larger */
    if @NumHash>1
    begin
                                                        -- # * 4000 character strings
          ;with a as (select 1 as n union all select 1) -- 2 
               ,b as (select 1 as n from a ,a a1)       -- 4
               ,c as (select 1 as n from b ,b b1)       -- 16
               ,d as (select 1 as n from c ,c c1)       -- 256
               ,e as (select 1 as n from d ,d d1)       -- 65,536
               ,f as (select 1 as n from e ,e e1)       -- 4,294,967,296 = 17+ TRILLION characters
               ,factored as (select row_number() over (order by n) rn from f)
               ,factors as (select rn,(rn*4000)+1 factor from factored)

          select @concat = cast((
          select right(sys.fn_varbintohexstr
                         (
                         hashbytes(@Algo, substring(@string, factor - 4000, 4000))
                         )
                      , 40) + ''
          from Factors
          where rn <= @NumHash
          for xml path('')
          ) as nvarchar(max))


          set @HASH = dbo.fn_hashbytesMAX(@concat ,@Algo)
    end
     else
     begin
          set @HASH = convert(varbinary(20), hashbytes(@Algo, @string))
     end

return @HASH
end

And the results are as following:

select 
 hashbytes('sha1', N'test') --native function with nvarchar input
,hashbytes('sha1', 'test') --native function with varchar input 
,dbo.fn_hashbytesMAX('test', 'sha1') --Galderisi's function which casts to nvarchar input
,dbo.fnGetHash('sha1', 'test') --your function

Output:

0x87F8ED9157125FFC4DA9E06A7B8011AD80A53FE1  
0xA94A8FE5CCB19BA61C4C0873D391E987982FBBD3  
0x87F8ED9157125FFC4DA9E06A7B8011AD80A53FE1   
0x00000000AE6DBA4E0F767D06A97038B0C24ED720662ED9F1


回答2:

If you can't create a function and have to use something that already exists in the DB:

sys.fn_repl_hash_binary

can be made to work using the syntax:

sys.fn_repl_hash_binary(cast('some really long string' as varbinary(max)))

Taken from: http://www.sqlnotes.info/2012/01/16/generate-md5-value-from-big-data/



回答3:

I've taken the accepted answer, and modified it a bit with the following improvements:

  1. no longer recursive function
  2. now schema bound
  3. no longer relying on undocumented stored procedures
  4. two versions: one for nvarchar, one for varchar
  5. returns same data size as HASHBYTES, leaving it up to the end user to convert to smaller based on algorithm used. This allows the functions to support future algorithms with larger data returns.

With these changes, the functions can now be used in persisted computed columns as they are now marked deterministic when created.

CREATE FUNCTION dbo.fnHashBytesNVARCHARMAX
(
    @Algorithm VARCHAR(10),
    @Text NVARCHAR(MAX)
)
RETURNS VARBINARY(8000)
WITH SCHEMABINDING
AS
BEGIN
    DECLARE @NumHash INT;
    DECLARE @HASH VARBINARY(8000);
    SET @NumHash = CEILING(DATALENGTH(@Text) / (8000.0));
    /* HashBytes only supports 8000 bytes so split the string if it is larger */
    WHILE @NumHash > 1
    BEGIN
        -- # * 4000 character strings
        WITH a AS
        (SELECT 1 AS n UNION ALL SELECT 1), -- 2 
        b AS
        (SELECT 1 AS n FROM a, a a1),       -- 4
        c AS
        (SELECT 1 AS n FROM b, b b1),       -- 16
        d AS
        (SELECT 1 AS n FROM c, c c1),       -- 256
        e AS
        (SELECT 1 AS n FROM d, d d1),       -- 65,536
        f AS
        (SELECT 1 AS n FROM e, e e1),       -- 4,294,967,296 = 17+ TRILLION characters
        factored AS
        (SELECT ROW_NUMBER() OVER (ORDER BY n) rn FROM f),
        factors AS
        (SELECT rn, (rn * 4000) + 1 factor FROM factored)
        SELECT  @Text = CAST
            (
                (
                    SELECT  CONVERT(VARCHAR(MAX), HASHBYTES(@Algorithm, SUBSTRING(@Text, factor - 4000, 4000)), 1)
                    FROM    factors
                    WHERE   rn <= @NumHash
                    FOR XML PATH('')
                ) AS NVARCHAR(MAX)
            );

        SET @NumHash = CEILING(DATALENGTH(@Text) / (8000.0));
    END;
    SET @HASH = CONVERT(VARBINARY(8000), HASHBYTES(@Algorithm, @Text));
    RETURN @HASH;
END;

CREATE FUNCTION dbo.fnHashBytesVARCHARMAX
(
    @Algorithm VARCHAR(10),
    @Text VARCHAR(MAX)
)
RETURNS VARBINARY(8000)
WITH SCHEMABINDING
AS
BEGIN
    DECLARE @NumHash INT;
    DECLARE @HASH VARBINARY(8000);
    SET @NumHash = CEILING(DATALENGTH(@Text) / (8000.0));
    /* HashBytes only supports 8000 bytes so split the string if it is larger */
    WHILE @NumHash > 1
    BEGIN
        -- # * 4000 character strings
        WITH a AS
        (SELECT 1 AS n UNION ALL SELECT 1), -- 2 
        b AS
        (SELECT 1 AS n FROM a, a a1),       -- 4
        c AS
        (SELECT 1 AS n FROM b, b b1),       -- 16
        d AS
        (SELECT 1 AS n FROM c, c c1),       -- 256
        e AS
        (SELECT 1 AS n FROM d, d d1),       -- 65,536
        f AS
        (SELECT 1 AS n FROM e, e e1),       -- 4,294,967,296 = 17+ TRILLION characters
        factored AS
        (SELECT ROW_NUMBER() OVER (ORDER BY n) rn FROM f),
        factors AS
        (SELECT rn, (rn * 8000) + 1 factor FROM factored)
        SELECT  @Text = CAST
        (
            (
                SELECT  CONVERT(VARCHAR(MAX), HASHBYTES(@Algorithm, SUBSTRING(@Text, factor - 8000, 8000)), 1)
                FROM    factors
                WHERE   rn <= @NumHash
                FOR XML PATH('')
            ) AS NVARCHAR(MAX)
        );

        SET @NumHash = CEILING(DATALENGTH(@Text) / (8000.0));
    END;
    SET @HASH = CONVERT(VARBINARY(8000), HASHBYTES(@Algorithm, @Text));
    RETURN @HASH;
END;


回答4:

You could write a SQL CLR function:

[Microsoft.SqlServer.Server.SqlFunction]
public static SqlBinary BigHashBytes(SqlString algorithm, SqlString data)
{
    var algo = HashAlgorithm.Create(algorithm.Value);

    var bytes = Encoding.UTF8.GetBytes(data.Value);

    return new SqlBinary(algo.ComputeHash(bytes));
}

And then it can be called in SQL like this:

--these return the same value
select HASHBYTES('md5', 'test stuff')
select dbo.BigHashBytes('md5', 'test stuff')

The BigHashBytes is only necessary if the length would be over 8k.



回答5:

tested and working select master.sys.fn_repl_hash_binary(someVarbinaryMaxValue) moreover not complicated :)



回答6:

This can be used as function body, too:

DECLARE @A NVARCHAR(MAX) = N'test'

DECLARE @res VARBINARY(MAX) = 0x
DECLARE @position INT = 1
        ,@len INT = DATALENGTH(@A)

WHILE 1 = 1
BEGIN
    SET @res = @res + HASHBYTES('SHA2_256', SUBSTRING(@A, @position, 4000))
    SET @position = @position+4000
    IF @Position > @len 
        BREAK
END

SELECT HASHBYTES('SHA2_256',@res)

The idea si to HASH each 4000 part of the NVARCHAR(MAX) string and concatanate the results. Then to HASH the latter result.



回答7:

It seems the easiest solution is to write a recursive hashing algorithm that parses the input text value into sub varchar(8000) segments. I arbitrarily chose to slice the input string into 7500 character segments The hashing algorithm returns a varbinary(20) which can easily be converted into a varchar(20)

ALTER FUNCTION [dbo].[BigHash]
(   
    @TextValue nvarchar(max)
)

RETURNS varbinary(20)

AS
BEGIN

    if @TextValue = null
        return hashbytes('SHA1', 'null')


    Declare @FirstPart as varchar(7500)
    Declare @Remainder as varchar(max)

    Declare @RemainderHash as varbinary(20)
    Declare @BinaryValue as varbinary(20)

    Declare @TextLength as integer


    Set @TextLength = len(@TextValue)

    if @TextLength > 7500
        Begin
            Set @FirstPart = substring(@TextValue, 1, 7500)         

            Set @Remainder = substring(@TextValue, 7501, @TextLength - 7500)        

            Set @RemainderHash = dbo.BigHash(@Remainder)

            Set @BinaryValue = hashbytes('SHA1', @FirstPart + convert( varchar(20), @RemainderHash, 2 ))

            return @BinaryValue

        End
    else
        Begin
            Set @FirstPart = substring(@TextValue, 1, @TextLength)                      
            Set @BinaryValue = hashbytes('SHA1', @FirstPart)

            return @BinaryValue
        End


    return null

END