My aim is to generate a directed graph of n vertices such that each vertex has an edge going out and an edge coming in. I thought one way of doing this would be to put all the vertices in a pot and for the vertices to take turns shuffling it and pulling out entries -- so for example if vertex 1 pulls out vertex 3 then that means there will be an edge going from 1 to 3. If a vertex pulls itself out of the pot, it just puts it back and reshuffles. If at the end, the last vertex finds that the pot only contains itself, then we need to start over. Here's my Kotlin code:
fun generateGraph(n: Int): Map<Int, Int> {
val vertices : List<Int> = (1..n).toList()
while (true) {
val pot = vertices.toMutableList()
val result = mutableMapOf<Int, Int>()
for (vertex in 1 until n) {
do {
java.util.Collections.shuffle(pot)
} while (pot[0] == vertex)
result.put(vertex, pot.removeAt(0))
}
if (pot[0] != n) {
result.put(n, pot.removeAt(0))
return result
}
else {
// The last vertex left in the pot is also the last one unassigned. Try again...
}
}
}
It seems to work. However when testing I'm finding it comes out with some graphs more than others. When n is 3 the only valid graphs are the cycles
{1=3, 2=1, 3=2}
{1=2, 2=3, 3=1}
but I'm finding the first comes up twice as many times as the second:
fun main(args: Array<String>) {
val n = 3
val patternCounts = mutableMapOf<Map<Int, Int>, Int>()
val trials = 10000
(1..trials).forEach({
val graph = generateGraph(n)
patternCounts[graph] = patternCounts.getOrDefault(graph, 0) + 1
})
println(patternCounts)
}
A run of this just now printed
{{1=3, 2=1, 3=2}=6669, {1=2, 2=3, 3=1}=3331}
What am I missing? And, is there a way to make this fair?
It's not difficult to see why that result occurs. Vertex 1 is matched with vertex 3 half the time. If that happens, the graph cannot be rejected because rejection only happens when the last remaining vertex is
n
(3 in this case) and that vertex has been used. So half the time you will get {(1,3), (2,1), (3,2)}.The other half of the time, vertex 1 will be matched with vertex 2, but then half of those cases (i.e ¼ of the total) will be rejected after vertex 2 is matched with vertex 1. So {(1,2), (2,3), (3,1)} will be chosen a quarter of the time.
In the remaining quarter, the whole procedure will be repeated, which means that {(1,3), (2,1), (3,2)} will continue to be chosen twice as often.
One solution is to reject the entire graph as soon as you match a vertex with itself. In this case, there is no need to reshuffle before selecting; you only reshuffle if the graph is rejected.
The general problem is that the case of matching a vertex with itself is not independent of all the other choices. So just reshuffling after certain matches and rejecting after other ones leads to a bias.
Rejecting and restarting after any match might not be the most efficient solution, but it will work. One way to make the algorithm more efficient would be to shuffle incrementally, rather than doing the entire shuffle and then validating it. Another possibility is described in a paper referenced from this question on Mathematics Stack Exchange
What you are missing is that your algrothim is unfair.
First, you need to know that software random number generator is not real random. It always make it seems to be fair, unlike real random.
Then, consider the following
give you 3 outcomes.
If you remove your do-while condition and if condition, all of the outcome has similar count.
However, you do-while condition prevents position = value. The possible outcomes are
Note that the distribution of the outcomes are NOT even. Consider the following:
Then, the if-condition DISCARD
2, 1, 3
(Not rerolling which is different from while loop. It started from scratch again. The remaining outcomes are(3, 1, 2):(2, 3, 1)
is about 2:1 which matches your result.Solution
Also, you should improve you math on probability and statistics before you doubt the number distribution of a random number generator.