Why is this way of randomly generating a graph unf

2019-07-20 14:42发布

问题:

My aim is to generate a directed graph of n vertices such that each vertex has an edge going out and an edge coming in. I thought one way of doing this would be to put all the vertices in a pot and for the vertices to take turns shuffling it and pulling out entries -- so for example if vertex 1 pulls out vertex 3 then that means there will be an edge going from 1 to 3. If a vertex pulls itself out of the pot, it just puts it back and reshuffles. If at the end, the last vertex finds that the pot only contains itself, then we need to start over. Here's my Kotlin code:

fun generateGraph(n: Int): Map<Int, Int> {
    val vertices : List<Int> = (1..n).toList()
    while (true) {
        val pot = vertices.toMutableList()
        val result = mutableMapOf<Int, Int>()
        for (vertex in 1 until n) {
            do {
                java.util.Collections.shuffle(pot)
            } while (pot[0] == vertex)
            result.put(vertex, pot.removeAt(0))
        }
        if (pot[0] != n) {
            result.put(n, pot.removeAt(0))
            return result
        }
        else {
            // The last vertex left in the pot is also the last one unassigned.  Try again...
        }
    }
}

It seems to work. However when testing I'm finding it comes out with some graphs more than others. When n is 3 the only valid graphs are the cycles

{1=3, 2=1, 3=2}
{1=2, 2=3, 3=1}

but I'm finding the first comes up twice as many times as the second:

fun main(args: Array<String>) {    
    val n = 3
    val patternCounts = mutableMapOf<Map<Int, Int>, Int>()
    val trials = 10000
    (1..trials).forEach({
        val graph = generateGraph(n)
        patternCounts[graph] = patternCounts.getOrDefault(graph, 0) + 1
    })
    println(patternCounts)
}

A run of this just now printed

{{1=3, 2=1, 3=2}=6669, {1=2, 2=3, 3=1}=3331}

What am I missing? And, is there a way to make this fair?

回答1:

It's not difficult to see why that result occurs. Vertex 1 is matched with vertex 3 half the time. If that happens, the graph cannot be rejected because rejection only happens when the last remaining vertex is n (3 in this case) and that vertex has been used. So half the time you will get {(1,3), (2,1), (3,2)}.

The other half of the time, vertex 1 will be matched with vertex 2, but then half of those cases (i.e ¼ of the total) will be rejected after vertex 2 is matched with vertex 1. So {(1,2), (2,3), (3,1)} will be chosen a quarter of the time.

In the remaining quarter, the whole procedure will be repeated, which means that {(1,3), (2,1), (3,2)} will continue to be chosen twice as often.

One solution is to reject the entire graph as soon as you match a vertex with itself. In this case, there is no need to reshuffle before selecting; you only reshuffle if the graph is rejected.

The general problem is that the case of matching a vertex with itself is not independent of all the other choices. So just reshuffling after certain matches and rejecting after other ones leads to a bias.

Rejecting and restarting after any match might not be the most efficient solution, but it will work. One way to make the algorithm more efficient would be to shuffle incrementally, rather than doing the entire shuffle and then validating it. Another possibility is described in a paper referenced from this question on Mathematics Stack Exchange



回答2:

What am I missing? And, is there a way to make this fair?

What you are missing is that your algrothim is unfair.

First, you need to know that software random number generator is not real random. It always make it seems to be fair, unlike real random.

Then, consider the following

java.util.Collections.shuffle(pot)

give you 3 outcomes.

1, 2, 3
1, 3, 2
2, 1, 3
2, 3, 1
3, 1, 2
3, 2, 1

If you remove your do-while condition and if condition, all of the outcome has similar count.

However, you do-while condition prevents position = value. The possible outcomes are

2, 1, 3
2, 3, 1
3, 1, 2

Note that the distribution of the outcomes are NOT even. Consider the following:

When vertex == 1:
    case pot[0] == 1:
        reroll
    case pot[0] == 2:
        continue // 50%
    case pot[0] == 3:
        continue // 50%

If the result[0] == 2:
    When vertex == 2:
        case pot[0] == 1:
            continue // 25%
        case pot[0] == 3:
            continue // 25%

If the result[0] == 3:
    When vertex == 2:
        case pot[0] == 1:
            continue // 50%
        case pot[0] == 2:
            reroll

Result:
    2, 1, 3 (25%)
    2, 3, 1 (25%)
    3, 1, 2 (50%)

Then, the if-condition DISCARD 2, 1, 3 (Not rerolling which is different from while loop. It started from scratch again. The remaining outcomes are

    2, 3, 1 (25%)
    3, 1, 2 (50%)

(3, 1, 2):(2, 3, 1) is about 2:1 which matches your result.

Solution

fun generateGraph(n: Int): Map<Int, Int> {
    val vertices : List<Int> = (1..n).toList()
    loop@ while (true) {
        val pot = vertices.toMutableList()
        val result = mutableMapOf<Int, Int>()
        // No need to shuffle evey position
        java.util.Collections.shuffle(pot)
        for (vertex in 1..n) {
            val value = pot[vertex-1]
            // if position == value, always start from scratch
            if (value == vertex)
                continue@loop
            result.put(vertex, value)
        }
        return result
    }
}

Also, you should improve you math on probability and statistics before you doubt the number distribution of a random number generator.