Using SQL Server 2014 I have a table that has a nvarchar(max)
column called [ASCII File]
which can contain an ASCII text file of many K. I then want to do a MD5 hashbytes on that file and the resultant hash should always be 20 bytes.
Well when I do a select of hashbytes('MD5', [ASCII File])
I get query completed with errors
Msg 8152, Level 16, State 10, Line 4
String or binary data would be truncated.
I get the same message when I try
left(hashbytes('MD5', [ASCII File]), 50)
I get the same message when I try
convert(varchar(50), hashbytes('MD5', [ASCII File]))
It seems like since the column I am doing the hashbytes on is nvarchar(max)
, the result of the hashbytes function also is nvarchar(max)
.
Can you tell me how I can get the result to be the expected 20 long and not something so long it has to be truncated?
It seems like since the field I am doing the hashbytes on is nvarchar(max) the result of the hashbytes is nvarchar(max).
No, that is not possible, especially since the return value of HASHBYTES is a VARBINARY
. Also, since your tests were just SELECT statements and not an INSERT statement, there is no way for the return value to get a truncation error. The truncation error is coming from the input value. As stated in that linked MSDN page for HASHBYTES
(for SQL Server 2012 and 2014):
Allowed input values are limited to 8000 bytes. The output conforms to the algorithm standard: 128 bits (16 bytes) for MD2, MD4, and MD5; 160 bits (20 bytes) for SHA and SHA1; 256 bits (32 bytes) for SHA2_256, and 512 bits (64 bytes) for SHA2_512.
That really says it all: the input is limited to 8000 bytes, and the output is a fixed number of bytes, based on the specified algorithm.
The updated documentation, for SQL Server 2016 (which has removed the 8000 byte limitation), states:
For SQL Server 2014 and earlier, allowed input values are limited to 8000 bytes.
You can run a simple test:
DECLARE @Test NVARCHAR(MAX) = REPLICATE(CONVERT(NVARCHAR(MAX), N't'), 50000);
SELECT LEN(@Test);
SELECT HASHBYTES('MD5', @Test);
Returns:
50000
Msg 8152, Level 16, State 10, Line 3
String or binary data would be truncated.
If you want to pass in more than 8000 bytes to a hash function in a version of SQL Server prior to 2016, then you need to use SQLCLR. You can either write your own function, or you can download and install the Free version of the SQL# SQLCLR library (which I created), and use the Util_Hash and Util_HashBinary functions:
DECLARE @Test NVARCHAR(MAX) = REPLICATE(CONVERT(NVARCHAR(MAX), N't'), 50000);
SELECT LEN(@Test);
SELECT SQL#.Util_Hash('MD5', CONVERT(VARBINARY(MAX), @Test));
SELECT SQL#.Util_HashBinary('MD5', CONVERT(VARBINARY(MAX), @Test));
Returns:
50000
40752EB301B41EEAEB309348CE9711D6
0x40752EB301B41EEAEB309348CE9711D6
UPDATE
In the case of using a VARCHAR(MAX)
column or variable but with 8000 or fewer characters (or an NVARCHAR(MAX)
column or variable with 4000 or fewer characters), there will be no issue and everything will work as expected:
DECLARE @Test VARCHAR(MAX) = REPLICATE('t', 5000);
SELECT LEN(@Test) AS [Characters],
HASHBYTES('MD5', @Test) AS [MD5];
Returns:
5000 0x6ABFBA10B49157F2EF8C85862B6E6313
In SQL Server 2016 we don't have any more the problem of length of input parameter for HASHBYTES function.
DECLARE @Test NVARCHAR(MAX);
SET @Test = REPLICATE(CONVERT(NVARCHAR(MAX), N't'), 50000000);
SELECT LEN(@Test);
SELECT HASHBYTES('SHA2_512', @Test);
HASHBYTES (Transact-SQL)
The input length limit of 8,000 bytes for the HASHBYTES (Transact-SQL) function is removed in sql 2016
Based on algorithm below are the output data size
128 bits (16 bytes) for MD2, MD4, and MD5;
160 bits (20 bytes) for SHA and SHA1;
256 bits (32 bytes) for SHA2_256
512 bits (64 bytes) for SHA2_512.
If you are trying to convert a large varbinary or image file already in sql then there are some built in functions that can do this (possibly from 2014 onwards), this simple function will work for both varbinary(max) and older Image fields..
/****** Object: UserDefinedFunction [dbo].[MD5Bin] Script Date: 16/07/2018 11:04:26 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
-- ==================================================
-- Author: Darren Steven
-- Create date: 16/07/2018
-- Description: Hashes a binary or image field with MD5
-- ==================================================
CREATE FUNCTION [dbo].[MD5Bin](@value varbinary(max))
RETURNS varchar(32)
AS
BEGIN
RETURN SUBSTRING(master.sys.fn_sqlvarbasetostr(master.sys.fn_repl_hash_binary(@value)),3,32);
END
GO
then simply call the function with your select:
SELECT dbo.MD5Bin(imageFieldName) from dbo.yourTable