How can I generate a map key for this vector in MA

2019-02-22 10:38发布

问题:

I have a function that is looking at a number of elements. Each element is of the form of an 8x1 column vector. Each entry in the vector is an integer less than 1000. Every time I see such a vector, I'd like to add it to a list of "already seen" vectors, after checking to see that the vector is not already on this list. The function will examine on the order of ~100,000 such vectors.

Originally I tried using ismember(v', M, 'rows'), but found this to be very slow. Next I tried:

found = containers.Map('KeyType', 'double', 'ValueType', 'any');

Then each time I examine a new vector v, compute:

key = dot(v, [1000000000000000000000 1000000000000000000 1000000000000000 ...
              1000000000000 1000000000 1000000 1000 1]);

Then check isKey(found, key). If the key is not in the container, then found(key) = 1.

This seems like a pretty lousy solution, even though it does run considerably faster than ismember. Any help/suggestions would be greatly appreciated.

EDIT: Perhaps it would be better to use mat2str to generate the key, rather than this silly dot product?

回答1:

The easiest way to generate a key/hash in your case would be to just convert the vector of integer values to a character array using char. Since your integer values never go above 1000, and char can accept numeric values from 0 to 65535 (corresponding to Unicode characters), this will give you a unique 8-character key for every unique 8-by-1 vector. Here's an example:

found = containers.Map('KeyType', 'char', 'ValueType', 'any');

v = randi(1000, [8 1]);  % Sample vector
key = char(v);
if ~isKey(found, key)
  found(key) = 1;
end


回答2:

your idea is good. but you need to find a better hash function. use some standard hash function. There is an implementation of 'sha' algorithms you's like to see:

http://www.se.mathworks.com/matlabcentral/fileexchange/31795-sha-algorithms-160224256384-512

If you find the sha algorithm slow then you can probably resort to some tricks. One that i can think of now is following:

  1. take a seed number > 1000 e.g. 1024
  2. divide each number in the vector by the seed and store the remainder in a string.
  3. concatenate all the remainders which will serve as your 'code' for your vector element. which can be used for comparison when a you see a new element.

this should probably work but you'll have to check.



回答3:

Not really into hashing, but still believe to have found the simplest way to solve your problem.

This runs about 10x faster than ismember.

any(v(1)==M(1)&v(2)==M(2)&v(3)==M(3)&v(4)==M(4)&v(5)==M(5)&v(6)==M(6)&v(7)==M(7)&v(8)==M(8));