How should I move from PRNG based generation to ha

I want to replace an existing random number based data generator (in Python) with a hash based one so that it no longer needs to generate everything in sequence, as inspired by this article.

I can create a float from 0 to 1 by taking the integer version of the hash and dividing it by the maximum value of a hash.

I can create a flat integer range by taking the float and multiplying by the flat range. I could probably use modulo and live with the bias, as the hash range is large and my flat ranges are small.

How could I use the hash to create a gaussian or normal distributed floating point value?

For all of these cases, would I be better off just using my hash as a seed for a new random.Random object and using the functions in that class to generate my numbers and rely on them to get the distribution characteristics right?

At the moment, my code is structured like this:

num_people = randint(1,100)
people = [dict() for x in range(num_people)]
for person in people:
    person['surname'] = choice(surname_list)
    person['forename'] = choice(forename_list)

The problem is that for a given seed to be consistent, I have to generate all the people in the same order, and I have to generate the surname then the forename. If I add a middle name in between the two then the generated forenames will change, as will all the names of all the subsequent people.

I want to structure the code like this:

h1_groupseed=1

h2_peoplecount=1
h2_people=2

h4_surname=1
h4_forename=2

num_people = pghash([h1_groupseed,h2_peoplecount]).hashint(1,100)
people = [dict() for x in range(num_people)]
for h3_index, person in enumerate(people,1):
    person['surname'] = surname_list[pghash([h1_groupseed,h2_people,h3_index,h4_surname]).hashint(0, num_of_surnames - 1)]
    person['forename'] = forename_list[pghash([h1_groupseed,h2_people,h3_index,h4_forename]).hashint(0, num_of_forenames - 1)]

This would use the values passed to pghash to generate a hash, and use that hash to somehow create the pseudorandom result.

标签： python random hash

3条回答

唯我独甜

2楼-- · 2019-07-26 23:06

First, a big caveat: DO NOT ROLL YOUR OWN CRYPTO. If you're trying to do this for security purposes, DON'T.

Next, check out this question which lists several ways to do what you want, i.e. transform a random uniform variable into a normal one: Converting a Uniform Distribution to a Normal Distribution

0人赞添加讨论(0) 举报

地球回转人心会变

3楼-- · 2019-07-26 23:07

Unless you're doing this for your own amusement or as a learning exercise, my very strong advice is don't do this.

PRNGs have the same general structure, even if the details are wildly different. They map a seed value s into an initial state S via some function f: S←f(s); they then iterate states via some transformation h: S_i+1←h(S_i); and finally they map the state to an output U via some function g: U_i←g(S_i). (For simple PRNGs, f() or g() are often identity functions. For more sophisticated generators such as Mersenne Twister, more is involved.)

The state transition function h() is designed to distribute new states uniformly across the state space. In other words, it's already a hash function, but with the added benefit that for any widely accepted generator it has been heavily vetted by experts to have good statistical behavior.

Mersenne Twister, Python's default PRNG, has been mathematically proven to have k-tuples be jointly uniformly distributed for all k ≤ 623. I'm guessing that whatever hash function you choose can't make such claims. Additionally, the collapsing function g() should preserve uniformity in the outcomes. You've proposed that you "can use the integer version of the hash to create a flat number range, just by taking the modulus." In general this will introduce modulo bias, so you won't end up with a uniformly distributed result.

If you stick with the built-in PRNG, there's no reason not to use the built-in Gaussian generator. If you want to do it for your own amusement there are lots of resources that will tell you how to map uniforms to Gaussians. Well-known methods include the Box-Muller method, Marsaglia's polar method, and the ziggurat method.

UPDATE

Given the additional information you've provided in your question, I think the answer you want is contained in this section of Python's documentation for random:

The functions supplied by this module are actually bound methods of a hidden instance of the random.Random class. You can instantiate your own instances of Random to get generators that don’t share state. This is especially useful for multi-threaded programs, creating a different instance of Random for each thread, and using the jumpahead() method to make it likely that the generated sequences seen by each thread don’t overlap.

Sounds like you want separate instances of Random for each person, seeded independently of each other or with synchronized but widely separated states as described in the random.jumpahead() documentation. This is one of the approaches that simulation modelers have used since the early 1950's so they can maintain repeatability between configurations to make direct comparisons of two or more systems in a fair fashion. Check out the discussion of "synchronization" on the second page of this article, or starting on page 8 of this book chapter, or pick up any of the dozens of simulation textbooks available in most university libraries and read the sections on "common random numbers." (I'm not pointing you towards Wikipedia because it provides almost no details on this topic.)

Here's an explicit example showing creating multiple instances of Random:

import random as rnd

print("two PRNG instances with identical seeding produce identical results:")
r1 = rnd.Random(12345)
r2 = rnd.Random(12345)
for _ in range(5):
    print([r1.normalvariate(0, 1), r2.normalvariate(0, 1)])

print("\ndifferent seeding yields distinct but reproducible results:")
r1 = rnd.Random(12345)
r2 = rnd.Random(67890)
for _ in range(3):
    print([r1.normalvariate(0, 1), r2.normalvariate(0, 1)])
print("\nresetting, different order of operations")
r1 = rnd.Random(12345)
r2 = rnd.Random(67890)
print("r1: ", [r1.normalvariate(0, 1) for _ in range(3)])
print("r2: ", [r2.normalvariate(0, 1) for _ in range(3)])

0人赞添加讨论(0) 举报

可以哭但决不认输i

4楼-- · 2019-07-26 23:15

I have gone ahead and created a simple hash-based replacement for some of the functions in the random.Random class:

from __future__ import division
import xxhash
from numpy import sqrt, log, sin, cos, pi

def gaussian(u1, u2):
    z1 = sqrt(-2*log(u1))*cos(2*pi*u2)
    z2 = sqrt(-2*log(u1))*sin(2*pi*u2)
    return z1,z2

class pghash:
    def __init__(self, tuple, seed=0, sep=','):
        self.hex = xxhash.xxh64(sep.join(tuple), seed=seed).hexdigest()

    def pgvalue(self):
        return int(self.hex, 16)

    def pghalves(self):
        return self.hex[:8], self.hex[8:]

    def pgvalues(self):
        return int(self.hex[:8], 16), int(self.hex[8:], 16)

    def random(self):
        return self.value() / 2**64

    def randint(self, min, max):
        return int(self.random() * max + min)

    def gauss(self, mu, sigma):
        xx = self.pgvalues()
        uu = [xx[0]/2**32, xx[1]/2**32]
        return gaussian(uu[0],uu[1])[0]

Next step is to go through my code and replace all the calls to random.Random methods with pghash objects.

I have made this into a module, which I hope to upload to pypi at some point: https://github.com/UKHomeOffice/python-pghash

0人赞添加讨论(0) 举报

How should I move from PRNG based generation to ha

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间