I'm looking to generate a random number and issue it to a table in a database for a particular user_id. The catch is, the same number can't be used twice. There's a million ways to do this, but I'm hoping someone very keen on algorithms has a clever way of solving the problem in an elegant solution in that the following criteria is met:
1) The least amount of queries to the database are made. 2) The least amount of crawling through a data structure in memory is made.
Essentially the idea is to do the following
1) Create a random number from 0 to 9999999
2) Check the database to see if the number exists
OR
2) Query the database for all numbers
3) See if the returned result matches whatever came from the db
4) If it matches, repeat step 1, if not, problem is solved.
Thanks.
I like Oddthinking's idea, but instead of choosing the strongest hash function in the world, you could simply:
MD5's are fast, and checking if a string belongs to an array will avoid you a SELECT.
Why don't you just use a GUID? Most languages should have a built-in way to do this. It's guaranteed to be unique (with very reasonable bounds).
there are a couple ways to go about this one way would be to construct an array with the numbers 0000000 through 9999999 and then pick a random pick of these numbers in this array and swap the picked numbers values with the highest value Max then reduce max by 1 and pick another random member of this array up to the new maximum
each time reducing Max by one
for example (in basic) : (to the right are comments which should be removed in the actual program) Rndfunc is a call to whatever random number generator function you are using
then keep doing this for each number you wish to pick , but you will need to have the option of using very big arrays
the other method would be as follows :generate a number and store it into an array that can grow dynamically then after that pick a new number and compare it to the value that is halfway from the first to the last element in the array in this case it would be the first number picked if it matches pick another random number, sort the array according to size and if there is not a match then depending on weather it is greater or smaller than the number you compared it with you go up or down in the list half of half the distance, each time that it does not match and is greater or lesser than what you are comparing it to.
each time halving it until you reach a gap size of one then you check once and stop as there is no match, and then the number is added to the list and the list is reshuffled in ascending order, so on and so on until you are done picking random numbers... hope this helps..
No your algorithm is not scalable. What I've done before is to issue numbers serially (+1 each time) and then pass them through an XOR operation to jumble the bits thus giving me a seemingly random numbers. Of course they aren't really random, but they look so to users eyes.
[Edit] Additional information
This algorithm's logic goes like this you use a known sequence to generate unique numbers and then you deterministically manipulate them, so they don't look serial anymore. The general solution is to use some form of encryption, which in my case was an XOR flipflop, because its as fast as it can get, and it fulfills the guarantee that numbers will never collide.
However you can use other forms of encryption, if you want prefer even more random looking numbers, over speed (say you don't need to generate many ids at a time). Now the important point in choosing an encryption algorithm is "the guarantee that numbers will never collide". And a way to prove if an encryption algorithm can fulfill this guarantee is to check if both the original number and the result of the encryption have the same number of bits, and that the the algorithm is reversible (bijection).
[Thanks to Adam Liss & CesarB for exapanding on the solution]
My experience was simply using the RNG in PHP. I found that using a certain size of number (I'm using an int, so I have a max of 4G). I ran some tests and found that on average, in 500,000 iterations, I got 120 single duplicates. I never got a triplicate after running the loop a bunch of times. My "solution" was to then just insert and check if it fails, then generate a new ID and go again.
My advice is to do the same and see what your collision rate is &c and see if it's acceptable for your case.
This isn't optimal, so if anyone has suggestions I'm looking too:)
EDIT: I was limited to a 5 digit ID ([a-zA-z0-9]{5,5}), the longer the id (more combination, the few collisions). An md5 of the email would almost never conflict, for instance.