I have an unlimited number of 12 byte messages arriving. The content can be treated as random and structureless. (The length is only important because it is shorter than most hashes.)
I want to deduplicate them.
One method is to store the last 1,000 messages in a circular buffer and check all 1,000 message for a match before accepting a message (and inserting it into the circular buffer for future checks).
What other methods are there, which could be more CPU and memory efficient?
12 Bytes seem quite small length. You can cast your byte array to string and then use a tree structure based on strings, via exploiting
strcmp()
.Way to cast byte array to string
Tree structure based on strings
Unless you form a skewed tree, O(logn) would be your worst case for deduplication. In such case, it is not hard to change to a self-balancing tree too.
Here my BST implementation using string type keys: