Select N random elements from a List in C#

2019-01-01 01:54发布

I need a quick algorithm to select 5 random elements from a generic list. For example, I'd like to get 5 random elements from a List<string>.

27条回答
后来的你喜欢了谁
2楼-- · 2019-01-01 02:17

I would use an extension method.

    public static IEnumerable<T> TakeRandom<T>(this IEnumerable<T> elements, int countToTake)
    {
        var random = new Random();

        var internalList = elements.ToList();

        var selected = new List<T>();
        for (var i = 0; i < countToTake; ++i)
        {
            var next = random.Next(0, internalList.Count - selected.Count);
            selected.Add(internalList[next]);
            internalList[next] = internalList[internalList.Count - selected.Count];
        }
        return selected;
    }
查看更多
低头抚发
3楼-- · 2019-01-01 02:19

Using LINQ with large lists (when costly to touch each element) AND if you can live with the possibility of duplicates:

new int[5].Select(o => (int)(rnd.NextDouble() * maxIndex)).Select(i => YourIEnum.ElementAt(i))

For my use i had a list of 100.000 elements, and because of them being pulled from a DB I about halfed (or better) the time compared to a rnd on the whole list.

Having a large list will reduce the odds greatly for duplicates.

查看更多
听够珍惜
4楼-- · 2019-01-01 02:20

When N is very large, the normal method that randomly shuffles the N numbers and selects, say, first k numbers, can be prohibitive because of space complexity. The following algorithm requires only O(k) for both time and space complexities.

http://arxiv.org/abs/1512.00501

def random_selection_indices(num_samples, N):
    modified_entries = {}
    seq = []
    for n in xrange(num_samples):
        i = N - n - 1
        j = random.randrange(i)

        # swap a[j] and a[i] 
        a_j = modified_entries[j] if j in modified_entries else j 
        a_i = modified_entries[i] if i in modified_entries else i

        if a_i != j:
            modified_entries[j] = a_i   
        elif j in modified_entries:   # no need to store the modified value if it is the same as index
            modified_entries.pop(j)

        if a_j != i:
            modified_entries[i] = a_j 
        elif i in modified_entries:   # no need to store the modified value if it is the same as index
            modified_entries.pop(i)
        seq.append(a_j)
    return seq
查看更多
谁念西风独自凉
5楼-- · 2019-01-01 02:21

Here's my approach (full text here http://krkadev.blogspot.com/2010/08/random-numbers-without-repetition.html ).

It should run in O(K) instead of O(N), where K is the number of wanted elements and N is the size of the list to choose from:

public <T> List<T> take(List<T> source, int k) {
 int n = source.size();
 if (k > n) {
   throw new IllegalStateException(
     "Can not take " + k +
     " elements from a list with " + n +
     " elements");
 }
 List<T> result = new ArrayList<T>(k);
 Map<Integer,Integer> used = new HashMap<Integer,Integer>();
 int metric = 0;
 for (int i = 0; i < k; i++) {
   int off = random.nextInt(n - i);
   while (true) {
     metric++;
     Integer redirect = used.put(off, n - i - 1);
     if (redirect == null) {
       break;
     }
     off = redirect;
   }
   result.add(source.get(off));
 }
 assert metric <= 2*k;
 return result;
}
查看更多
孤独寂梦人
6楼-- · 2019-01-01 02:23

This is actually a harder problem than it sounds like, mainly because many mathematically-correct solutions will fail to actually allow you to hit all the possibilities (more on this below).

First, here are some easy-to-implement, correct-if-you-have-a-truly-random-number generator:

(0) Kyle's answer, which is O(n).

(1) Generate a list of n pairs [(0, rand), (1, rand), (2, rand), ...], sort them by the second coordinate, and use the first k (for you, k=5) indices to get your random subset. I think this is easy to implement, although it is O(n log n) time.

(2) Init an empty list s = [] that will grow to be the indices of k random elements. Choose a number r in {0, 1, 2, ..., n-1} at random, r = rand % n, and add this to s. Next take r = rand % (n-1) and stick in s; add to r the # elements less than it in s to avoid collisions. Next take r = rand % (n-2), and do the same thing, etc. until you have k distinct elements in s. This has worst-case running time O(k^2). So for k << n, this can be faster. If you keep s sorted and track which contiguous intervals it has, you can implement it in O(k log k), but it's more work.

@Kyle - you're right, on second thought I agree with your answer. I hastily read it at first, and mistakenly thought you were indicating to sequentially choose each element with fixed probability k/n, which would have been wrong - but your adaptive approach appears correct to me. Sorry about that.

Ok, and now for the kicker: asymptotically (for fixed k, n growing), there are n^k/k! choices of k element subset out of n elements [this is an approximation of (n choose k)]. If n is large, and k is not very small, then these numbers are huge. The best cycle length you can hope for in any standard 32 bit random number generator is 2^32 = 256^4. So if we have a list of 1000 elements, and we want to choose 5 at random, there's no way a standard random number generator will hit all the possibilities. However, as long as you're ok with a choice that works fine for smaller sets, and always "looks" random, then these algorithms should be ok.

Addendum: After writing this, I realized that it's tricky to implement idea (2) correctly, so I wanted to clarify this answer. To get O(k log k) time, you need an array-like structure that supports O(log m) searches and inserts - a balanced binary tree can do this. Using such a structure to build up an array called s, here is some pseudopython:

# Returns a container s with k distinct random numbers from {0, 1, ..., n-1}
def ChooseRandomSubset(n, k):
  for i in range(k):
    r = UniformRandom(0, n-i)                 # May be 0, must be < n-i
    q = s.FirstIndexSuchThat( s[q] - q > r )  # This is the search.
    s.InsertInOrder(q ? r + q : r + len(s))   # Inserts right before q.
  return s

I suggest running through a few sample cases to see how this efficiently implements the above English explanation.

查看更多
步步皆殇っ
7楼-- · 2019-01-01 02:24

I think the selected answer is correct and pretty sweet. I implemented it differently though, as I also wanted the result in random order.

    static IEnumerable<SomeType> PickSomeInRandomOrder<SomeType>(
        IEnumerable<SomeType> someTypes,
        int maxCount)
    {
        Random random = new Random(DateTime.Now.Millisecond);

        Dictionary<double, SomeType> randomSortTable = new Dictionary<double,SomeType>();

        foreach(SomeType someType in someTypes)
            randomSortTable[random.NextDouble()] = someType;

        return randomSortTable.OrderBy(KVP => KVP.Key).Take(maxCount).Select(KVP => KVP.Value);
    }
查看更多
登录 后发表回答