c# generic, covering both arrays and lists?

2020-02-28 03:13发布

问题:

Here's a very handy extension, which works for an array of anything:

public static T AnyOne<T>(this T[] ra) where T:class
{
    int k = ra.Length;
    int r = Random.Range(0,k);
    return ra[r];
}

Unfortunately it does not work for a List<> of anything. Here's the same extension that works for any List<>

public static T AnyOne<T>(this List<T> listy) where T:class
{
    int k = listy.Count;
    int r = Random.Range(0,k);
    return listy[r];
}

In fact, is there a way to generalise generics covering both arrays and List<>s in one go? Or is it know to be not possible?


It occurs to me, could the answer even further encompass Collections? Or indeed has one of the experts below already achieved that??


PS, I apologize for not explicitly mentioning this is in the Unity3D milieu. "Random.Range" is a unity-to-the-bone function, and an "AnyOne" call reads as 100% game engine to any game engineers. It's the first extension you type in for any game project and you use it constantly in game code ("any explosion!" "any coin sound effect!" etc etc!)

Obviously, it could of course be used in any c# milieu.

回答1:

In fact the most appropriate common interface between T[] and List<T> for your case is IReadOnlyList<T>

public static T AnyOne<T>(this IReadOnlyList<T> list) where T:class
{
    int k = list.Count;
    int r = Random.Range(0,k);
    return list[r];
}

As mentioned in another answer, IList<T> also works, but the good practice requires you to request from the caller the minimum functionality needed by the method, which in this case is Count property and read only indexer.

IEnumerable<T> also works, but it allows the caller to pass a non collection iterator where Count and ElementAt extension methods could be highly inefficient - like Enumerable.Range(0, 1000000), database query etc.


Note for Unity3D engineers: If you look at the very bottom of the IReadOnlyList Interface documentation‌​, it is available since .Net 4.5. In earlier versions of .Net you have to resort to IList<T> (available since 2.0). Unity stays well behind on .Net versions. For 2016 Unity is only using .Net 2.0.5. So for Unity3D, you have to use IList<T>.



回答2:

T[] and List<T> actually both implement IList<T>, which provides enumeration, a Count property and an indexer.

public static T AnyOne<T>(this IList<T> ra) 
{
    int k = ra.Count;
    int r = Random.Range(0,k);
    return ra[r];
}

Just a note: for Unity3D milieu specifically, this is exactly the correct answer. Regarding the further improvement on this answer, IReadOnlyList<T>, it is not available in Unity3D. (Regarding the (ingenious) extension of IEnumerable<T> to even cover cases of objects with no countingness / indexability, certainly in a game engine situation that would be a distinct concept (such as AnyOneEvenInefficiently or AnyOneEvenFromUnsafeGroups).)



回答3:

It's interesting how some people choose IEnumerable<T>, while some other people insist on IReadOnlyList<T>.

Now let's be honest. IEnumerable<T> is useful, very useful. In most cases you just want to put this method in some library, and throw your utility function to whatever you think is a collection, and be done with it. However, using IEnumerable<T> correctly is a bit tricky, as I'll point out here...

IEnumerable

Let's for a second assume that the OP is using Linq and wants to get a random element from a sequence. Basically he ends up with the code from @Yannick, that ends up in the library of utility helper functions:

public static T AnyOne<T>(this IEnumerable<T> source)
{
    int endExclusive = source.Count(); // #1
    int randomIndex = Random.Range(0, endExclusive); 
    return source.ElementAt(randomIndex); // #2
}

Now, what this basically does is 2 things:

  1. Count the number of elements in the source. If the source is a simple IEnumerable<T> this implies going through all the elements in the list, if it's f.ex. a List<T>, it will use the Count property.
  2. Reset the enumerable, go to element randomIndex, grab it and return it.

There are two things that can go wrong here. First of all, your IEnumerable might be a slow, sequential storage, and doing Count can ruin the performance of your application in an unexpected way. For example, streaming from a device might get you into trouble. That said, you could very well argue that's to be expected when that's inherent to the characteristic of the collection - and personally I'd say that argument will hold.

Secondly -and this is perhaps even more important- there's no guarantee that you enumerable will return the same sequence every iteration (and therefore there's also no guarantee that your code won't crash). For example, consider this innocent looking piece of code, that might be useful for testing purposes:

IEnumerable<int> GenerateRandomDataset()
{
    Random rnd = new Random();
    int count = rnd.Next(10, 100); // randomize number of elements
    for (int i=0; i<count; ++i)
    {
        yield return new rnd.Next(0, 1000000); // randomize result
    }
}

The first iteration (calling Count()), you might generate 99 results. You pick element 98. Next you call ElementAt, the second iteration generates 12 results and your application crashes. Not cool.

Fixing the IEnumerable implementation

As we've seen, the issue of the IEnumerable<T> implementation is that you have to go through the data 2 times. We can fix that by going through the data a single time.

The 'trick' here is actually pretty simple: if we have seen 1 element, we definitely want to consider returning that. All elements considered, there's a 50%/50% chance that this is the element we would have returned. If we see the third element, there's a 33%/33%/33% chance that we would have returned this. And so on.

Therefore, a better implementation might be this one:

public static T AnyOne<T>(this IEnumerable<T> source)
{
    Random rnd = new Random();
    double count = 1;
    T result = default(T);
    foreach (var element in source)
    {
        if (rnd.NextDouble() <= (1.0 / count)) 
        {
            result = element;
        }
        ++count;
    }
    return result;
}

On a side note: if we're using Linq, we would expect operations to use the IEnumerable<T> once (and only once!). Now you know why.

Making it work with lists and arrays

While this is a neat trick, our performance will now be slower if we work on a List<T>, which doesn't make any sense because we know there's a much better implementation available due the the property that indexing and Count are available to us.

What we're looking for is the common denominator for this better solution, that's used in as many collections as we can find. The thing we'll end up with is the IReadOnlyList<T> interface, that implements everything we need.

Because of the properties that we know to be true for IReadOnlyList<T>, we can now safely use Count and indexing, without running the risk of crashing the application.

However, while IReadOnlyList<T> seems appealing, IList<T> for some reason doesn't seem to implement it... which basically means that IReadOnlyList<T> is a bit of a gamble in practice. In that respect, I'm pretty sure there are a lot more IList<T> implementations out there than IReadOnlyList<T> implementations. It therefore seems best to simply support both interfaces.

This leads us to the solution here:

public static T AnyOne<T>(this IEnumerable<T> source)
{
    var rnd = new Random();
    var list = source as IReadOnlyList<T>;
    if (list != null)
    {
        int index = rnd.Next(0, list.Count);
        return list[index];
    }

    var list2 = source as IList<T>;
    if (list2 != null)
    {
        int index = rnd.Next(0, list2.Count);
        return list2[index];
    }
    else
    {
        double count = 1;
        T result = default(T);
        foreach (var element in source)
        {
            if (rnd.NextDouble() <= (1.0 / count))
            {
                result = element;
            }
            ++count;
        }
        return result;
    }
}

PS: For more complex scenario's, check out the Strategy Pattern.

Random

@Yannick Motton made the remark that you have to be careful with Random, because it won't be really random if you call methods like this a lot of times. Random is initialized with the RTC, so if you make a new instance a lot of times, it won't change the seed.

A simple way around this is as follows:

private static int seed = 12873; // some number or a timestamp.

// ...

// initialize random number generator:
Random rnd = new Random(Interlocked.Increment(ref seed));

This way, every time you call AnyOne, the random number generator will receive another seed and it will work even in tight loops.

To summarize:

So, to summarize it:

  • IEnumerable<T>'s should be iterated once, and only once. Doing otherwise might give the user unexpected results.
  • If you have access to better capabilities than simple enumeration, it's not necessary to go through all the elements. Best to grab the right result right away.
  • Consider what interfaces you're checking very carefully. While IReadOnlyList<T> is definitely the best candidate, it's not inherited from IList<T> which means it'll be less effective in practice.

The end result is something that Just Works.



回答4:

T[] and List<T> both share the same interface: IEnumerable<T>.

IEnumerable<T> however, does not have a Length or Count member, but there is an extension method Count(). Also there is no indexer on sequences, so you must use the ElementAt(int) extension method.

Something along the lines of:

public static T AnyOne<T>(this IEnumerable<T> source)
{
    int endExclusive = source.Count();
    int randomIndex = Random.Range(0, endExclusive); 
    return source.ElementAt(randomIndex);
}


回答5:

You could change your definition a bit:

public static T AnyOne<T>(this IEnumerable<T> ra) 
{
    if(ra==null)
        throw new ArgumentNullException("ra");

    int k = ra.Count();
    int r = Random.Range(0,k);
    return ra.ElementAt(r-1);
}

Now you define an extension method for all the types that implement the IEnumerable<T> interface.