In this case, the MAX is only 5, so I could check the duplicates one by one, but how could I do this in a simpler way? For example, what if the MAX has a value of 20? Thanks.
int MAX = 5;
for (i = 1 , i <= MAX; i++)
{
drawNum[1] = (int)(Math.random()*MAX)+1;
while (drawNum[2] == drawNum[1])
{
drawNum[2] = (int)(Math.random()*MAX)+1;
}
while ((drawNum[3] == drawNum[1]) || (drawNum[3] == drawNum[2]) )
{
drawNum[3] = (int)(Math.random()*MAX)+1;
}
while ((drawNum[4] == drawNum[1]) || (drawNum[4] == drawNum[2]) || (drawNum[4] == drawNum[3]) )
{
drawNum[4] = (int)(Math.random()*MAX)+1;
}
while ((drawNum[5] == drawNum[1]) ||
(drawNum[5] == drawNum[2]) ||
(drawNum[5] == drawNum[3]) ||
(drawNum[5] == drawNum[4]) )
{
drawNum[5] = (int)(Math.random()*MAX)+1;
}
}
Here's how I'd do it
As the esteemed Mr Skeet has pointed out:
If n is the number of randomly selected numbers you wish to choose and N is the total sample space of numbers available for selection:
It really all depends on exactly WHAT you need the random generation for, but here's my take.
First, create a standalone method for generating the random number. Be sure to allow for limits.
Next, you will want to create a very simple decision structure that compares values. This can be done in one of two ways. If you have a very limited amount of numbers to verify, a simple IF statement will suffice:
The above compares int1 to int2 through int5, as well as making sure that there are no zeroes in the randoms.
With these two methods in place, we can do the following:
Followed By:
If you have a longer list to verify, then a more complex method will yield better results both in clarity of code and in processing resources.
Hope this helps. This site has helped me so much, I felt obliged to at least TRY to help as well.
There is algorithm of card batch: you create ordered array of numbers (the "card batch") and in every iteration you select a number at random position from it (removing the selected number from the "card batch" of course).
Instead of doing all this create a
LinkedHashSet
object and random numbers to it byMath.random()
function .... if any duplicated entry occurs theLinkedHashSet
object won't add that number to its List ... Since in this Collection Class no duplicate values are allowed .. in the end u get a list of random numbers having no duplicated values .... :DGenerating all the indices of a sequence is generally a bad idea, as it might take a lot of time, especially if the ratio of the numbers to be chosen to
MAX
is low (the complexity becomes dominated byO(MAX)
). This gets worse if the ratio of the numbers to be chosen toMAX
approaches one, as then removing the chosen indices from the sequence of all also becomes expensive (we approachO(MAX^2/2)
). But for small numbers, this generally works well and is not particularly error-prone.Filtering the generated indices by using a collection is also a bad idea, as some time is spent in inserting the indices into the sequence, and progress is not guaranteed as the same random number can be drawn several times (but for large enough
MAX
it is unlikely). This could be close to complexityO(k n log^2(n)/2)
, ignoring the duplicates and assuming the collection uses a tree for efficient lookup (but with a significant constant costk
of allocating the tree nodes and possibly having to rebalance).Another option is to generate the random values uniquely from the beginning, guaranteeing progress is being made. That means in the first round, a random index in
[0, MAX]
is generated:In the second round, only
[0, MAX - 1]
is generated (as one item was already selected):The values of the indices then need to be adjusted: if the second index falls in the second half of the sequence (after the first index), it needs to be incremented to account for the gap. We can implement this as a loop, allowing us to select arbitrary number of unique items.
For short sequences, this is quite fast
O(n^2/2)
algorithm:Where
n_select_num
is your 5 andn_number_num
is yourMAX
. Then_Rand(x)
returns random integers in[0, x]
(inclusive). This can be made a bit faster if selecting a lot of items (e.g. not 5 but 500) by using binary search to find the insertion point. To do that, we need to make sure that we meet the requirements.We will do binary search with the comparison
n + j < rand_num[j]
which is the same asn < rand_num[j] - j
. We need to show thatrand_num[j] - j
is still a sorted sequence for a sorted sequencerand_num[j]
. This is fortunately easily shown, as the lowest distance between two elements of the originalrand_num
is one (the generated numbers are unique, so there is always difference of at least 1). At the same time, if we subtract the indicesj
from all the elementsrand_num[j]
, the differences in index are exactly 1. So in the "worst" case, we get a constant sequence - but never decreasing. The binary search can therefore be used, yieldingO(n log(n))
algorithm:And finally:
I have tested this on three benchmarks. First, 3 numbers were chosen out of 7 items, and a histogram of the items chosen was accumulated over 10,000 runs:
This shows that each of the 7 items was chosen approximately the same number of times, and there is no apparent bias caused by the algorithm. All the sequences were also checked for correctness (uniqueness of contents).
The second benchmark involved choosing 7 numbers out of 5000 items. The time of several versions of the algorithm was accumulated over 10,000,000 runs. The results are denoted in comments in the code as
b1
. The simple version of the algorithm is slightly faster.The third benchmark involved choosing 700 numbers out of 5000 items. The time of several versions of the algorithm was again accumulated, this time over 10,000 runs. The results are denoted in comments in the code as
b2
. The binary search version of the algorithm is now more than two times faster than the simple one.The second method starts being faster for choosing more than cca 75 items on my machine (note that the complexity of either algorithm does not depend on the number of items,
MAX
).It is worth mentioning that the above algorithms generate the random numbers in ascending order. But it would be simple to add another array to which the numbers would be saved in the order in which they were generated, and returning that instead (at negligible additional cost
O(n)
). It is not necessary to shuffle the output: that would be much slower.Note that the sources are in C++, I don't have Java on my machine, but the concept should be clear.
EDIT:
For amusement, I have also implemented the approach that generates a list with all the indices
0 .. MAX
, chooses them randomly and removes them from the list to guarantee uniqueness. Since I've chosen quite highMAX
(5000), the performance is catastrophic:I have also implemented the approach with a
set
(a C++ collection), which actually comes second on benchmarkb2
, being only about 50% slower than the approach with the binary search. That is understandable, as theset
uses a binary tree, where the insertion cost is similar to binary search. The only difference is the chance of getting duplicate items, which slows down the progress.Full source code is here.