I'm looking to generate a random number and issue it to a table in a database for a particular user_id. The catch is, the same number can't be used twice. There's a million ways to do this, but I'm hoping someone very keen on algorithms has a clever way of solving the problem in an elegant solution in that the following criteria is met:
1) The least amount of queries to the database are made. 2) The least amount of crawling through a data structure in memory is made.
Essentially the idea is to do the following
1) Create a random number from 0 to 9999999
2) Check the database to see if the number exists
OR
2) Query the database for all numbers
3) See if the returned result matches whatever came from the db
4) If it matches, repeat step 1, if not, problem is solved.
Thanks.
PHP already has a function for this, uniqid. It generates a standard uuid which is great if you have to access the data from elsewhere. Don't reinvent the wheel.
A proper PRNG (Pseudo-Random Number Generator) algorithm will have a cycle time during which it will never be in the same state. If you expose the entire state of the PRNG in the number retrieved from it, you will get a number guaranteed unique for the period of the generator.
A simple PRNG that does this is called the 'Linear Congruential' PRNG which iterates a formula:
Using the right pair of factors you can get a period of 2^30 (approximately 1 billion) from a simple PRNG with a 32 bit accumulator. Note that you will need a 64 bit long long temporary variable to hold the intermediate 'AX' part of the computation. Most if not all C compilers will support this data type. You should also be able to do it with a numeric data type on most SQL dialects.
With the right values of A and M we can get a random number generator with good statistical and geometrical properties. There is a famous paper about this written by Fishman and Moore.
For M = 2^31 - 1 we get can use the values of A below to get a PRNG with a nice long period (2^30 IIRC).
Good Values of A:
Note that this type of generator is (by definition) not cryptographically secure. If you know the last number generated from it you can predict what it will do next. Unfortunately I believe that you cannot get cryptographic security and guaranteed non-repeatability at the same time. For a PRNG to be cryptographically secure (e.g. Blum Blum Shub) it cannot expose sufficient state in a generated number to allow the next number in the sequence to be predicted. Therefore the internal state is wider than the generated number and (in order to have good security) the period will be longer than the number of possible values that can be generated. This means that the exposed number will not be unique within the period.
For similar reasons the same is true of long-period generators such as the Mersenne Twister.
Assuming:
You could do something simple as having the random number as a 64 bit integer, with the upper 32 bits containing the timestamp (at row insert) and the lower 32 bits the user_id. That would be unique even for multiple rows with the same user, provided you use an appropriate resolution on your timestamp depending on how often you add new rows for the same user. Combine with an unique constraint on the random column and catch any such error in your logic and then just retry.
I think you'll find that you really do not want to do this. As the numbers in the database increase, you might spend too much time in the "make sure this number isn't taken" loop.
Personally, I've had luck with hashes as an alternative, but to come up with a better solution, I'd really need to know why you want to do it this way.
If you really want to get "random" numbers form 0 to 9 999 999, then the solution is to do the "randomization" once, and then store the result to your disk.
It's not hard to get the result you want, but I think of it more like "make a long list with numbers", than "get a random number".
You also need a pointer to current position in $numbers (store it in a database); start with 0 and increment it each time you need a new number. (Or you could use array_shift() or array_pop(), if you dont like to use pointers.)
It's easy to design a pseudorandom number generator with a long period of nonrepetition; e.g. this one, which is being used for the same thing that you want it for.
BTW, why not just issue the userid's sequentially?