How do I convert a string, e.g. a user ID plus salt, to a random looking but actually a deterministically repeatable uniform probability in the semi-open range [0.0, 1.0)? This means that the output is ≥ 0.0 and < 1.0. The output distribution must be uniform irrespective of the input distribution. For example, if the input string is 'a3b2Foobar', the output probability could repeatably be 0.40341504.
Cross-language and cross-platform algorithmic reproducibility is desirable. I'm inclined to use a hash function unless there is a better way. Here is what I have:
>>> in_str = 'a3b2Foobar'
>>> (int(hashlib.sha256(in_str.encode()).hexdigest(), 16) % 1e8) / 1e8
0.40341504
I'm using the latest stable Python 3. Please note that this question is similar but not exactly identical to the related question to convert an integer to a random but deterministically repeatable choice.
Using hash
A cryptographic hash is assumably a uniformly distributed integer in the range [0, MAX_HASH]. Accordingly, it can be scaled to a floating-point number in the range [0, 1) by dividing it by MAX_HASH + 1.
Notes:
hash
method must not be used because it can preserve the input's distribution, e.g. withhash(123)
. Alternatively, it can return values that differ when Python is restarted, e.g. withhash('123')
.Using random
The
random
module can be used within_str
as its seed, while addressing concerns surrounding both thread safety and continuity.With this approach, not only is cross-language reproducibility a concern, but reproducibility across multiple future versions of Python could also be a concern. It is therefore not recommended.