T-SQL/CLR function for deterministic encryption

2019-08-07 17:34发布

问题:

I have a table with User Agents Strings table with the following structure:

UserAgentStringID INT
UserAgentStringValue VARBINARY(8000)

The [UserAgentStringValue] field is encrypted with symmetric key. The previous version of the table structure was:

UserAgentStringID INT
UserAgentStringValue NVARCHAR(4000)
UserAgentStringHASH BINARY(32)

and I have index on the [UserAgentStringHASH] column in order to optimized searchers.

With the new format, such index is not efficient as the ENCRYPTION function uses InitializationVector in order to generate random values each time the encryption function is called with the same input:

Initialization vectors are used to initialize the block algorithm. It is not intended to be a secret, but must be unique for every call to the encryption function in order to avoid revealing patterns.

So, I can create index on my encrypted field, but if I try to search by encrypted value, I will not be able to find anything.

I do not want to use HASH because using hash function is not secure technique. If someone have my table data and table with all or huge amount of user agents, he/she will be able to perform an join by hash and reveal my data.

In SQL Server 2016 SP standard edition we have Always Encrypted which allows using Deterministic Encryption for column value - this means equal comparisons are working and indexes can be created.

I am looking for a way to optimize the search by other technique or a way to implement deterministic encryption using CLR for example?

Knowing there is no work around is OK for me, too. I guess I will pay the data protection with performance.

回答1:

I am posting a workaround of this - it's not the ideal solution, but it is compromise between speed and security.

The details

  • a columns must be encrypted (lets say an email address)
  • fast search must be implemented (let say the email is used for login and we need to locate the record as fast as possible)
  • we are not able to use Always Encrypted deterministic encryption (due to various reasons)
  • we don't want to use hash function event with salt - if one has the salt for each user, ze be might be able to read the hashes using large sample database

The security hierarchy

There are various ways of implementing the security hierarchy. The following schema from the MSDN describes it very well.

In our environment we are using the Database Mater Key -> Certificate -> Symmetric Key hierarchy. Only DBAs know the DMK password, have access to certificate and symmetric keys. Some developers can do encrypt/decrypt data (using plain T-SQL) and other do not.

Note, using Always Encrypted you can have role separation - the people who works with the data have not access to the keys, and the people who have access to the keys, do not have access to the data. In our case, we want to protect our data from outsiders and have other techniques for granting/logging data access internally.

Developers with access to encrypted data

The developers who can access the protected data are able to encrypt and decrypt it. They have not access to the symmetric key values. If one have access to the symmetric key values, ze is able to decrypt the data event not having the certifications used for protecting the symmetric keys. Basically, only sys.admins and db_owners have access to the symmetric keys values.

How to hash

We need a hash to get fast searches, but we cannot use a salt which is not encrypted. And hash without a salt is like plain text from security perspective. So, we've decided to use use the symmetric key value as salt. It is get like this:

SELECT @SymmetricKeyValue = CONVERT(VARCHAR(128), DECRYPTBYCERT(C.[certificate_id], KE.[crypt_property]), 1)
FROM [sys].[symmetric_keys] SK
INNER JOIN [sys].[key_encryptions] KE
    ON SK.[symmetric_key_id] = KE.[key_id] 
INNER JOIN [sys].[certificates] C
    ON KE.[thumbprint] = C.[thumbprint]
WHERE SK.[name] = @SymmetricKeyName;

And the value is concatenated to your email address and then the hash is calculated. It is good for us, because we are binding the hash to the security hierarchy. And it is not a different salt for each record, it is the same - but if one knows the symmetric key value, ze is able to decrypt the data directly.

Considerations

You need to create the routines (stored procedures, triggers) which are searching by hash values or computing hashes using the EXECUTE AS OWNER clause. Otherwise, developers will not be able to execute them as only sys.admins and db_owners have access to the symmetric key value.