I have a function that is looking at a number of elements. Each element is of the form of an 8x1 column vector. Each entry in the vector is an integer less than 1000. Every time I see such a vector, I'd like to add it to a list of "already seen" vectors, after checking to see that the vector is not already on this list. The function will examine on the order of ~100,000 such vectors.
Originally I tried using ismember(v', M, 'rows')
, but found this to be very slow. Next I tried:
found = containers.Map('KeyType', 'double', 'ValueType', 'any');
Then each time I examine a new vector v
, compute:
key = dot(v, [1000000000000000000000 1000000000000000000 1000000000000000 ...
1000000000000 1000000000 1000000 1000 1]);
Then check isKey(found, key)
. If the key is not in the container, then found(key) = 1
.
This seems like a pretty lousy solution, even though it does run considerably faster than ismember
. Any help/suggestions would be greatly appreciated.
EDIT: Perhaps it would be better to use mat2str
to generate the key, rather than this silly dot product?
The easiest way to generate a key/hash in your case would be to just convert the vector of integer values to a character array using char
. Since your integer values never go above 1000, and char
can accept numeric values from 0 to 65535 (corresponding to Unicode characters), this will give you a unique 8-character key for every unique 8-by-1 vector. Here's an example:
found = containers.Map('KeyType', 'char', 'ValueType', 'any');
v = randi(1000, [8 1]); % Sample vector
key = char(v);
if ~isKey(found, key)
found(key) = 1;
end
your idea is good. but you need to find a better hash function. use some standard hash function.
There is an implementation of 'sha' algorithms you's like to see:
http://www.se.mathworks.com/matlabcentral/fileexchange/31795-sha-algorithms-160224256384-512
If you find the sha algorithm slow then you can probably resort to some tricks. One that i can think of now is following:
- take a seed number > 1000 e.g. 1024
- divide each number in the vector by the seed and store the remainder in a string.
- concatenate all the remainders which will serve as your 'code' for your vector element. which can be used for comparison when a you see a new element.
this should probably work but you'll have to check.
Not really into hashing, but still believe to have found the simplest way to solve your problem.
This runs about 10x faster than ismember.
any(v(1)==M(1)&v(2)==M(2)&v(3)==M(3)&v(4)==M(4)&v(5)==M(5)&v(6)==M(6)&v(7)==M(7)&v(8)==M(8));