I'm in the process of writing a Membership Provider for use with our existing membership base. I use EF4.1 for all of my database access and one of the issued that I'm running into is when the DB was originally setup the relationships were done programmatically instead of in the db. One if the relationships needs to be made on a column that isn't required for all of our users, but in order to make the relationships does need to be unique (from my understanding).
My solution that I believe will work is to do an MD5 hash on the userid field (which is unique ...which would/should guarantee a unique value in that field). The part that I'm having issues with on sql server is the query that would do this WITHOUT replacing the existing values stored in the employeeNum field (the one in question).
So in a nutshell my question is. What is the best way to get a unique value in the employeeNum
field (possibly based on an md5 hash of the userid
field) on all the rows in which a value isn't already present. Also, to a minor/major extent...does this sound like a good plan?
If your question is just how to generate a hash value for userid, you can do it this way using a computed column (or generate this value as part of the insert process). It isn't clear to me whether you know about the HASHBYTES function or what other criteria you're looking at when you say "best."
DECLARE @foo TABLE
(
userid INT,
hash1 AS HASHBYTES('MD5', CONVERT(VARCHAR(12), userid)),
hash2 AS HASHBYTES('SHA1', CONVERT(VARCHAR(12), userid))
);
INSERT @foo(userid) SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 500;
SELECT userid, hash1, hash2 FROM @foo;
Results:
userid hash1 hash2
------ ---------------------------------- ------------------------------------------
1 0xC4CA4238A0B923820DCC509A6F75849B 0x356A192B7913B04C54574D18C28D46E6395428AB
2 0xC81E728D9D4C2F636F067F89CC14862C 0xDA4B9237BACCCDF19C0760CAB7AEC4A8359010B0
500 0xCEE631121C2EC9232F3A2F028AD5C89B 0xF83A383C0FA81F295D057F8F5ED0BA4610947817
In SQL Server 2012, I highly recommend at least SHA2_256 instead of either of the above. (You forgot to mention what version you're using - always useful information.)
All that said, I still want to call attention to the point I made in the comments: the "best" solution here is to fix the model. If employeeNum
is optional, EF shouldn't be made to think it is required or unique, and it shouldn't be used in relationships if it is not, in fact, some kind of identifier. Why would a user care about collisions between employeeNum
and userid
if you're using the right attribute for the relationship in the first place?
EDIT as requested by OP
So what is wrong with saying UPDATE table SET EmployeeNum = 1000000 + UserID WHERE EmployeeNum IS NULL
? If EmployeeNum
will stay below 1000000
then you've guaranteed no collisions and you've avoided hashing altogether.
You could generate similar padding if employeeNum
might contain a string, but again is it EF that promotes these horrible column names? Why would a column with a Num
suffix contain anything but a number?
You could also use a uniqueidentifier setting the default value to (newid())
Create a new column EmployeeNum as uniqueidentifer, then:
UPDATE Employees SET EmployeeNum = newid()
Then set as primary key.
UPDATE EMPLOYEE
SET EMPLOYEENUM = HASHBYTES('SHA1', CAST(USERID AS VARCHAR(20)))
WHERE EMPLOYEENUM IS NULL