Generate Unique hash for a field in SQL Server

2019-04-14 19:22发布

问题:

I'm in the process of writing a Membership Provider for use with our existing membership base. I use EF4.1 for all of my database access and one of the issued that I'm running into is when the DB was originally setup the relationships were done programmatically instead of in the db. One if the relationships needs to be made on a column that isn't required for all of our users, but in order to make the relationships does need to be unique (from my understanding).

My solution that I believe will work is to do an MD5 hash on the userid field (which is unique ...which would/should guarantee a unique value in that field). The part that I'm having issues with on sql server is the query that would do this WITHOUT replacing the existing values stored in the employeeNum field (the one in question).

So in a nutshell my question is. What is the best way to get a unique value in the employeeNum field (possibly based on an md5 hash of the userid field) on all the rows in which a value isn't already present. Also, to a minor/major extent...does this sound like a good plan?

回答1:

If your question is just how to generate a hash value for userid, you can do it this way using a computed column (or generate this value as part of the insert process). It isn't clear to me whether you know about the HASHBYTES function or what other criteria you're looking at when you say "best."

DECLARE @foo TABLE
(
  userid INT, 
  hash1 AS HASHBYTES('MD5',  CONVERT(VARCHAR(12), userid)),
  hash2 AS HASHBYTES('SHA1', CONVERT(VARCHAR(12), userid))
);

INSERT @foo(userid) SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 500;

SELECT userid, hash1, hash2 FROM @foo;

Results:

userid  hash1                               hash2
------  ----------------------------------  ------------------------------------------
1       0xC4CA4238A0B923820DCC509A6F75849B  0x356A192B7913B04C54574D18C28D46E6395428AB
2       0xC81E728D9D4C2F636F067F89CC14862C  0xDA4B9237BACCCDF19C0760CAB7AEC4A8359010B0
500     0xCEE631121C2EC9232F3A2F028AD5C89B  0xF83A383C0FA81F295D057F8F5ED0BA4610947817

In SQL Server 2012, I highly recommend at least SHA2_256 instead of either of the above. (You forgot to mention what version you're using - always useful information.)

All that said, I still want to call attention to the point I made in the comments: the "best" solution here is to fix the model. If employeeNum is optional, EF shouldn't be made to think it is required or unique, and it shouldn't be used in relationships if it is not, in fact, some kind of identifier. Why would a user care about collisions between employeeNum and userid if you're using the right attribute for the relationship in the first place?

EDIT as requested by OP

So what is wrong with saying UPDATE table SET EmployeeNum = 1000000 + UserID WHERE EmployeeNum IS NULL? If EmployeeNum will stay below 1000000 then you've guaranteed no collisions and you've avoided hashing altogether.

You could generate similar padding if employeeNum might contain a string, but again is it EF that promotes these horrible column names? Why would a column with a Num suffix contain anything but a number?



回答2:

You could also use a uniqueidentifier setting the default value to (newid())

Create a new column EmployeeNum as uniqueidentifer, then:

UPDATE Employees SET EmployeeNum = newid()

Then set as primary key.



回答3:

UPDATE EMPLOYEE
    SET EMPLOYEENUM = HASHBYTES('SHA1', CAST(USERID AS VARCHAR(20)))
WHERE EMPLOYEENUM IS NULL