Performance issue with generation of random unique

2019-02-21 12:05发布

I have a situation where by I need to create tens of thousands of unique numbers. However these numbers must be 9 digits and cannot contain any 0's. My current approach is to generate 9 digits (1-9) and concatenate them together, and if the number is not already in the list adding it into it. E.g.

public void generateIdentifiers(int quantity)
{
    uniqueIdentifiers = new List<string>(quantity);
    while (this.uniqueIdentifiers.Count < quantity)
    {
        string id = string.Empty;
        id += random.Next(1,10);
        id += random.Next(1,10);
        id += random.Next(1,10);
        id += " ";
        id += random.Next(1,10);
        id += random.Next(1,10);
        id += random.Next(1,10);
        id += " ";
        id += random.Next(1,10);
        id += random.Next(1,10);
        id += random.Next(1,10);
        if (!this.uniqueIdentifiers.Contains(id))
        {
            this.uniqueIdentifiers.Add(id);
        }
    }
}

However at about 400,000 the process really slows down as more and more of the generated numbers are duplicates. I am looking for a more efficient way to perform this process, any help would be really appreciated.

Edit: - I'm generating these - http://www.nhs.uk/NHSEngland/thenhs/records/Pages/thenhsnumber.aspx

10条回答
手持菜刀,她持情操
2楼-- · 2019-02-21 12:18

Try avoiding checks making sure that you always pick up a unique number:

static char[] base9 = "123456789".ToCharArray();

static string ConvertToBase9(int value) {
    int num = 9;
    char[] result = new char[9];
    for (int i = 8; i >= 0; --i) { 
        result[i] = base9[value % num];
        value = value / num;
    }
    return new string(result);
}

public static void generateIdentifiers(int quantity) {
    var uniqueIdentifiers = new List<string>(quantity);
    // we have 387420489 (9^9) possible numbers of 9 digits in base 9.
    // if we choose a number that is prime to that we can easily get always
    // unique numbers
    Random random = new Random();
    int inc = 386000000;
    int seed = random.Next(0, 387420489);
    while (uniqueIdentifiers.Count < quantity) {
        uniqueIdentifiers.Add(ConvertToBase9(seed));
        seed += inc;
        seed %= 387420489;
    }
}

I'll try to explain the idea behind with small numbers...

Suppose you have at most 7 possible combinations. We choose a number that is prime to 7, e.g. 3, and a random starting number, e.g. 4.

At each round, we add 3 to our current number, and then we take the result modulo 7, so we get this sequence:

4 -> 4 + 3 % 7 = 0
0 -> 0 + 3 % 7 = 3
3 -> 3 + 3 % 7 = 6
6 -> 6 + 6 % 7 = 5

In this way, we generate all the values from 0 to 6 in a non-consecutive way. In my example, we are doing the same, but we have 9^9 possible combinations, and as a number prime to that I choose 386000000 (you just have to avoid multiples of 3).

Then, I pick up the number in the sequence and I convert it to base 9.

I hope this is clear :)

I tested it on my machine, and generating 400k unique values took ~ 1 second.

查看更多
Melony?
3楼-- · 2019-02-21 12:19

Meybe this will bee faster:

        //we can generate first number wich in 9 base system will be between 88888888 - 888888888 
        //we can't start from zero becouse it will couse the great amount of 1 digit at begining

        int randNumber = random.Next((int)Math.Pow(9, 8) - 1, (int)Math.Pow(9, 9));


        //no we change our number to 9 base, but we add 1 to each digit in our number
        StringBuilder builder = new StringBuilder();

        for (int i=(int)Math.Pow(9,8); i>0;i= i/9)
        {
            builder.Append(randNumber / i +1);
            randNumber = randNumber % i;
        }

        id = builder.ToString();
查看更多
Anthone
4楼-- · 2019-02-21 12:19

I think @slugster is broadly right - although you could run two parallel processes, one to generate numbers, the other to verify them and add them to the list of accepted numbers when verified. Once you have enough, signal the original process to stop.

Combine this with other suggestions - using more efficient and appropriate data structures - and you should have something that works acceptably.

However the question of why you need such numbers is also significant - this requirement seems like one that should be analysed.

查看更多
Evening l夕情丶
5楼-- · 2019-02-21 12:21

As others have mentioned, use a HashSet<T> instead of a List<T>.
Furthermore, using StringBuilder instead of simple string operations will gain you another 25%. If you can use numbers instead of strings, you win, because it only takes a third or fourth of the time.

var quantity = 400000;
var uniqueIdentifiers = new HashSet<int>();
while (uniqueIdentifiers.Count < quantity)
{
    int i=0;
    i = i*10 + random.Next(1,10);
    i = i*10 + random.Next(1,10);
    i = i*10 + random.Next(1,10);
    i = i*10 + random.Next(1,10);
    i = i*10 + random.Next(1,10);
    i = i*10 + random.Next(1,10);
    i = i*10 + random.Next(1,10);
    i = i*10 + random.Next(1,10);
    i = i*10 + random.Next(1,10);
    uniqueIdentifiers.Add(i);
}

It takes about 270 ms on my machine for 400,000 numbers and about 700 for 1,000,000. And this even without any parallelism. Because of the use of a HashSet<T> instead of a List<T>, this algorithm runs in O(n), i.e. the duration will grow linear. 10,000,000 values therefore take about 7 seconds.

查看更多
你好瞎i
6楼-- · 2019-02-21 12:21

This suggestion may or may not be popular.... it depends on people's perspective. Because you haven't been too specific about what you need them for, how often, or the exact number, I will suggest a brute force approach.

I would generate a hundred thousand numbers - shouldn't take very long at all, maybe a few seconds? Then use Parallel LINQ to do a Distinct() on them to eliminate duplicates. Then use another PLINQ query to run a regex against the remainder to eliminate any with zeroes in them. Then take the top x thousand. (PLINQ is brilliant for ripping through large tasks like this). If needed, rinse and repeat until you have enough for your needs.

On a decent machine it will just about take you longer to write this simple function than it will take to run it. I would also query why you have 400K entries to test when you state you actually need "tens of thousands"?

查看更多
老娘就宠你
7楼-- · 2019-02-21 12:21

Looking at the solutions already posted, mine seems fairly basic. But, it works, and generates 1million values in approximate 1s (10 million in 11s).

public static void generateIdentifiers(int quantity)
{
    HashSet<int> uniqueIdentifiers = new HashSet<int>();

    while (uniqueIdentifiers.Count < quantity)
    {
        int value = random.Next(111111111, 999999999);
        if (!value.ToString().Contains('0') && !uniqueIdentifiers.Contains(value))
            uniqueIdentifiers.Add(value);
    }
}
查看更多
登录 后发表回答