Create a uniform random number based on a hash

2020-06-06 06:46发布

问题:

I need a good pseudo random number based on a key consisting of a string and a long. I should get the same random number when I query using the same key and also, I should get a very different number if I query using a slightly different key, even when say the long in the key is off by 1. I tried this code and the random numbers are unique but for similar numbers they seem correlated.

import java.util.Date;
import java.util.Random;
import org.apache.commons.lang3.builder.HashCodeBuilder;

public class HashKeyTest {
    long time;
    String str;
    public HashKeyTest(String str, long time) {
        this.time = time;
        this.str = str;
    }

    @Override
    public int hashCode() {
        return new HashCodeBuilder().append(time).append(str).toHashCode();
    }

    public static void main(String[] args) throws Exception {
        for(int i=0; i<10; i++){
            long time = new Date().getTime();
            HashKeyTest hk = new HashKeyTest("SPY", time);
            long hashCode = (long)hk.hashCode();
            Random rGen = new Random(hashCode);
            System.out.format("%d:%d:%10.12f\n", time, hashCode, rGen.nextDouble());
            Thread.sleep(1);
        }
    }
}

Solution I pieced together. This works pretty well, but I wonder if it needs to be this verbose.

import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.io.ObjectOutputStream;
import java.io.Serializable;
import java.nio.ByteBuffer;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import java.util.Random;

public class HashKeyTest implements Serializable{

    long time;
    String str;

    public HashKeyTest(String str, long time) {
        this.time = time;
        this.str = str;
    }

    public double random() throws IOException, NoSuchAlgorithmException {
        ByteArrayOutputStream bos = new ByteArrayOutputStream();
        ObjectOutputStream out = new ObjectOutputStream(bos);
        out.writeObject(this);
        byte[] bytes = bos.toByteArray();
        MessageDigest md5Digest = MessageDigest.getInstance("MD5");
        byte[] hash = md5Digest.digest(bytes);
        ByteBuffer bb = ByteBuffer.wrap(hash);
        long seed = bb.getLong();

        return new Random(seed).nextDouble();
    }

    public static void main(String[] args) throws Exception {
        long time = 0;
        for (int i = 0; i < 10; i++) {
            time += 250L;
            HashKeyTest hk = new HashKeyTest("SPY", time);
            System.out.format("%d:%10.12f\n", time, hk.random());
            Thread.sleep(1);
        }
    }
}

回答1:

You said "I should get the same random number when I query using the same key and also, I should get a very different number if I query using a slightly different key". If I understand your question correctly, you do not want a random number, but rather something like a cryptographic hash code.

You should look at passing whatever data you have through a hash function like SHA or MD5. This will give you something that is seemingly random with respect to the input, but will always be the same given the same input, and will vary wildly even if your input vary only very little.

EDIT: To consistently obtain double values try something like this (pseudo-code):

SHAHashValue v = ComputeSHA( yourObject);
Random r = new Random(v);
the_random_value = r.getNext();

The idea here is to use the SHA hash value as the seed to initialize your random generator. This is pretty much what you have, but I don't know what your HashBuilder produces in terms of different values. So using SHA hashes instead might improve the situation.

You should also consider that "very different" values for doubles between 0 and 1 might not be immediately apparent.



回答2:

I would just use the key's hash itself as the "random" number. Assuming a sensible hash implementation, it will have all the properties you've mentioned.



回答3:

That is a somewhat surprising result. I would have thought that a small difference in the seed should lead to a large difference in the stream of random numbers. On reflection, i don't know why i thought that.

Still, it's easily fixed!

The simplest thing, perhaps, is simply to let the random number generator warm up a bit before using it. The bitstreams produced by your different seeds start off similar, but diverge quite quickly, so simply throwing away the early parts of the bitstreams should do the job. Immediately after the line where you create the Random, add this:

rGen.nextLong();

Or, for more divergence:

for (int j = 0; j < 10; ++j) rGen.nextLong();

A quick test shows that this gets a much wider variety of numbers.

Another option would be to use a java.security.SecureRandom as the random number generator. This does a better job of generating different outputs from similar inputs. You seed it with a byte array; you could produce one by saying something like (str + time).getBytes().

A further option would be to take your seed, then hash it using a cryptographic hash such as SHA-256, then use some portion of that as the seed. The hashing would take very similar inputs and produce very different outputs, which would then give you suitably different random bitstreams.



回答4:

What I understand is:

  • Your object has two instance variables - A long time and a string str which need to be taken into account to calculate the random number
  • You want the random number to be very sensitive to the time part.
  • The same time+str combination should produce the same random number.
  • It is OK if two different time+str combinations produce the same random number.

From the code you posted, it seems the HashCodeBuilder() is not as sensitive as you want it to be to the time.

Apart from what others have suggested, one idea could be to change the time itself in a consistent way.

You could take the last digit of time (the long part of the key) and move it to somewhere in the middle of the number. e.g, your hashCode() can be:

@Override
public int hashCode() {
    return (new org.apache.commons.lang.builder.HashCodeBuilder()
            .append(time+((time%10)*100000000)).append(str).toHashCode());
}

(The code is not exactly moving the last digit to the middle but is doing something similar in the context of the question)

But this would be kind of slow. So you could transform it to bit operators.

@Override
public int hashCode() {
    return (new org.apache.commons.lang.builder.HashCodeBuilder()
            .append(time+((time & 63l) << 57)).append(str).toHashCode());
}

Kind of like extracting the last 6 bits of time (time & 63l) and putting those bits way in the front (57 is kind of random. I just want to move those bits to more significant positions). This does not match the "move digit to somewhere in the middle" analogy exactly, but is similar to that conceptually.

You will get more variance if you extract only the last 5 bits (time & 31l). You could try different values. For the code posted in the question, the time & 63l version returns the following output:

1339343005559:-1084202043:0.339762681480
1339343005585:1801482883:0.323979029483
1339343005586:559968862:0.786162684846
1339343005587:-681545159:0.241820545267
1339343005588:-580881900:0.692788956755
1339343005590:1231057354:0.624686671170
1339343005591:-10456667:0.530394885899
1339343005592:1700819920:0.894868466104
1339343005593:459305899:0.149584882259
1339343005595:-2023722143:0.289584988289

which, as expected, shows much more variance for small changes in the long part of the key.