I need a good pseudo random number based on a key consisting of a string and a long. I should get the same random number when I query using the same key and also, I should get a very different number if I query using a slightly different key, even when say the long in the key is off by 1. I tried this code and the random numbers are unique but for similar numbers they seem correlated.
import java.util.Date;
import java.util.Random;
import org.apache.commons.lang3.builder.HashCodeBuilder;
public class HashKeyTest {
long time;
String str;
public HashKeyTest(String str, long time) {
this.time = time;
this.str = str;
}
@Override
public int hashCode() {
return new HashCodeBuilder().append(time).append(str).toHashCode();
}
public static void main(String[] args) throws Exception {
for(int i=0; i<10; i++){
long time = new Date().getTime();
HashKeyTest hk = new HashKeyTest("SPY", time);
long hashCode = (long)hk.hashCode();
Random rGen = new Random(hashCode);
System.out.format("%d:%d:%10.12f\n", time, hashCode, rGen.nextDouble());
Thread.sleep(1);
}
}
}
Solution I pieced together. This works pretty well, but I wonder if it needs to be this verbose.
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.io.ObjectOutputStream;
import java.io.Serializable;
import java.nio.ByteBuffer;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import java.util.Random;
public class HashKeyTest implements Serializable{
long time;
String str;
public HashKeyTest(String str, long time) {
this.time = time;
this.str = str;
}
public double random() throws IOException, NoSuchAlgorithmException {
ByteArrayOutputStream bos = new ByteArrayOutputStream();
ObjectOutputStream out = new ObjectOutputStream(bos);
out.writeObject(this);
byte[] bytes = bos.toByteArray();
MessageDigest md5Digest = MessageDigest.getInstance("MD5");
byte[] hash = md5Digest.digest(bytes);
ByteBuffer bb = ByteBuffer.wrap(hash);
long seed = bb.getLong();
return new Random(seed).nextDouble();
}
public static void main(String[] args) throws Exception {
long time = 0;
for (int i = 0; i < 10; i++) {
time += 250L;
HashKeyTest hk = new HashKeyTest("SPY", time);
System.out.format("%d:%10.12f\n", time, hk.random());
Thread.sleep(1);
}
}
}
That is a somewhat surprising result. I would have thought that a small difference in the seed should lead to a large difference in the stream of random numbers. On reflection, i don't know why i thought that.
Still, it's easily fixed!
The simplest thing, perhaps, is simply to let the random number generator warm up a bit before using it. The bitstreams produced by your different seeds start off similar, but diverge quite quickly, so simply throwing away the early parts of the bitstreams should do the job. Immediately after the line where you create the
Random
, add this:Or, for more divergence:
A quick test shows that this gets a much wider variety of numbers.
Another option would be to use a
java.security.SecureRandom
as the random number generator. This does a better job of generating different outputs from similar inputs. You seed it with a byte array; you could produce one by saying something like(str + time).getBytes()
.A further option would be to take your seed, then hash it using a cryptographic hash such as SHA-256, then use some portion of that as the seed. The hashing would take very similar inputs and produce very different outputs, which would then give you suitably different random bitstreams.
You said "I should get the same random number when I query using the same key and also, I should get a very different number if I query using a slightly different key". If I understand your question correctly, you do not want a random number, but rather something like a cryptographic hash code.
You should look at passing whatever data you have through a hash function like SHA or MD5. This will give you something that is seemingly random with respect to the input, but will always be the same given the same input, and will vary wildly even if your input vary only very little.
EDIT: To consistently obtain double values try something like this (pseudo-code):
The idea here is to use the SHA hash value as the seed to initialize your random generator. This is pretty much what you have, but I don't know what your HashBuilder produces in terms of different values. So using SHA hashes instead might improve the situation.
You should also consider that "very different" values for doubles between 0 and 1 might not be immediately apparent.
What I understand is:
time
and a stringstr
which need to be taken into account to calculate the random numbertime
part.time
+str
combination should produce the same random number.time
+str
combinations produce the same random number.From the code you posted, it seems the
HashCodeBuilder()
is not as sensitive as you want it to be to thetime
.Apart from what others have suggested, one idea could be to change the
time
itself in a consistent way.You could take the last digit of
time
(thelong
part of the key) and move it to somewhere in the middle of the number. e.g, yourhashCode()
can be:(The code is not exactly moving the last digit to the middle but is doing something similar in the context of the question)
But this would be kind of slow. So you could transform it to bit operators.
Kind of like extracting the last 6 bits of time (
time & 63l
) and putting those bits way in the front (57
is kind of random. I just want to move those bits to more significant positions). This does not match the "move digit to somewhere in the middle" analogy exactly, but is similar to that conceptually.You will get more variance if you extract only the last 5 bits (
time & 31l
). You could try different values. For the code posted in the question, thetime & 63l
version returns the following output:which, as expected, shows much more variance for small changes in the
long
part of the key.I would just use the key's hash itself as the "random" number. Assuming a sensible hash implementation, it will have all the properties you've mentioned.