I need a quick algorithm to select 5 random elements from a generic list. For example, I'd like to get 5 random elements from a List<string>
.
相关问题
- Sorting 3 numbers without branching [closed]
- Graphics.DrawImage() - Throws out of memory except
- Why am I getting UnauthorizedAccessException on th
- 求获取指定qq 资料的方法
- How to know full paths to DLL's from .csproj f
I would use an extension method.
Using LINQ with large lists (when costly to touch each element) AND if you can live with the possibility of duplicates:
For my use i had a list of 100.000 elements, and because of them being pulled from a DB I about halfed (or better) the time compared to a rnd on the whole list.
Having a large list will reduce the odds greatly for duplicates.
When N is very large, the normal method that randomly shuffles the N numbers and selects, say, first k numbers, can be prohibitive because of space complexity. The following algorithm requires only O(k) for both time and space complexities.
http://arxiv.org/abs/1512.00501
Here's my approach (full text here http://krkadev.blogspot.com/2010/08/random-numbers-without-repetition.html ).
It should run in O(K) instead of O(N), where K is the number of wanted elements and N is the size of the list to choose from:
This is actually a harder problem than it sounds like, mainly because many mathematically-correct solutions will fail to actually allow you to hit all the possibilities (more on this below).
First, here are some easy-to-implement, correct-if-you-have-a-truly-random-number generator:
(0) Kyle's answer, which is O(n).
(1) Generate a list of n pairs [(0, rand), (1, rand), (2, rand), ...], sort them by the second coordinate, and use the first k (for you, k=5) indices to get your random subset. I think this is easy to implement, although it is O(n log n) time.
(2) Init an empty list s = [] that will grow to be the indices of k random elements. Choose a number r in {0, 1, 2, ..., n-1} at random, r = rand % n, and add this to s. Next take r = rand % (n-1) and stick in s; add to r the # elements less than it in s to avoid collisions. Next take r = rand % (n-2), and do the same thing, etc. until you have k distinct elements in s. This has worst-case running time O(k^2). So for k << n, this can be faster. If you keep s sorted and track which contiguous intervals it has, you can implement it in O(k log k), but it's more work.
@Kyle - you're right, on second thought I agree with your answer. I hastily read it at first, and mistakenly thought you were indicating to sequentially choose each element with fixed probability k/n, which would have been wrong - but your adaptive approach appears correct to me. Sorry about that.
Ok, and now for the kicker: asymptotically (for fixed k, n growing), there are n^k/k! choices of k element subset out of n elements [this is an approximation of (n choose k)]. If n is large, and k is not very small, then these numbers are huge. The best cycle length you can hope for in any standard 32 bit random number generator is 2^32 = 256^4. So if we have a list of 1000 elements, and we want to choose 5 at random, there's no way a standard random number generator will hit all the possibilities. However, as long as you're ok with a choice that works fine for smaller sets, and always "looks" random, then these algorithms should be ok.
Addendum: After writing this, I realized that it's tricky to implement idea (2) correctly, so I wanted to clarify this answer. To get O(k log k) time, you need an array-like structure that supports O(log m) searches and inserts - a balanced binary tree can do this. Using such a structure to build up an array called s, here is some pseudopython:
I suggest running through a few sample cases to see how this efficiently implements the above English explanation.
I think the selected answer is correct and pretty sweet. I implemented it differently though, as I also wanted the result in random order.