I need generate a single hash over some data in a table
CREATE TABLE Table1
(
F1 INT UNSIGNED NOT NULL AUTO_INCREMENT,
F2 INT default NULL,
F3 Varchar(50) default NULL,
..
FN INT default NULL,
PRIMARY KEY (F1)
);
i.e. F1, F3,FN where F2=10
SELECT md5(CONCAT_WS('#',F1,F3,FN)) FROM Tabe1 WHERE F2=10
Gives a Hash for each row in the table.
QUESTIONS
1) How do get a single hash over the whole table?
2) What is the fasts hashing algorithm to use MD5, SHA1, SHA or any other?
EDIT:
Mysql 4.1 is been used - and it does NOT have Trigger Support
As for speed, you should try. It depends on the way the functions are implemented.
Chances are, however, that you will see very little speed differences. The hash functions you cite are all faster than what an average disk can spew out, so the question is not really "what hash function will make the code runs fastest ?" but "what hash function will make the CPU most idle while it waits for the data from the disk ?".
On my Intel Core2 Q6600, clocked at 2.4 GHz (64-bit mode), with my own C implementation of hash functions, I get the following hashing speeds:
That's using a single core only. My hard disks top at about 100 MB/s, so one can say that even with SHA-256, the hashing process will use no more than 17% of the machine CPU power. Of course, nothing guarantees that the implementation used by MySQL is that fast, which is why you should try. Also, in 32-bit mode, SHA-512 performance decreases quite a bit.
Cryptographically speaking, (grave) weaknesses have been found in MD5 and SHA-1, so if you work in a security-relevant setting (i.e. you want to detect changes even if there is someone who can choose some of the changes and would prefer that you do not detect said changes), you should stick to SHA-256 or SHA-512, which, as far as we know, are robust enough. MD5 and SHA-1 are still fine in non-security situations, though.
See BIT_XOR: http://dev.mysql.com/doc/refman/5.6/en/group-by-functions.html "Returns the bitwise XOR of all bits in expr. The calculation is performed with 64-bit (BIGINT) precision. This function returns 0 if there were no matching rows." For an example of usage, check pt-table-sync.
If by any reason you can't use Triggers, a different approach is to use the CONCAT option, like:
But be aware that if the table has allot of data the query will be slow! if possible try to exclude unnecessary columns from the CONCACT.
Also take note that by default MySQL Max CONCACT is 1024, there maybe the need to change this by running first the following query:
Note that 18446744073709547520 is the maximum value, you could use a different one!
1)
2) Speed doesn't really matters as a function has to run only once and all hash functions are fast enough
I would use a MySQL Trigger to detect changes on insert, delete, update, etc.
Altough this thread is old, maybe this is what you need: http://dev.mysql.com/doc/refman/5.0/en/checksum-table.html