Safely checking non-repeatable IEnumerables for em

2020-02-28 02:46发布

问题:

There are times when it's helpful to check a non-repeatable IEnumerable to see whether or not it's empty. LINQ's Any doesn't work well for this, since it consumes the first element of the sequence, e.g.

if(input.Any())
{
    foreach(int i in input)
    {
        // Will miss the first element for non-repeatable sequences!
    }
}

(Note: I'm aware that there's no need to do the check in this case - it's just an example! The real-world example is performing a Zip against a right-hand IEnumerable that can potentially be empty. If it's empty, I want the result to be the left-hand IEnumerable as-is.)

I've come up with a potential solution that looks like this:

private static IEnumerable<T> NullifyIfEmptyHelper<T>(IEnumerator<T> e)
{
    using(e)
    {
        do
        {
            yield return e.Current;
        } while (e.MoveNext());
    }
}

public static IEnumerable<T> NullifyIfEmpty<T>(this IEnumerable<T> source)
{
    IEnumerator<T> e = source.GetEnumerator();
    if(e.MoveNext())
    {
        return NullifyIfEmptyHelper(e);
    }
    else
    {
        e.Dispose();
        return null;
    }
}

This can then be used as follows:

input = input.NullifyIfEmpty();
if(input != null)
{
    foreach(int i in input)
    {
        // Will include the first element.
    }
}

I have two questions about this:

1) Is this a reasonable thing to do? Is it likely to be problematic from a performance point of view? (I'd guess not, but worth asking.)

2) Is there a better way of achieving the same end goal?


EDIT #1:

Here's an example of a non-repeatable IEnumerable, to clarify:

private static IEnumerable<int> ReadNumbers()
{
    for(;;)
    {
        int i;
        if (int.TryParse(Console.ReadLine(), out i) && i != -1)
        {
            yield return i;
        }
        else
        {
            yield break;
        }
    }
}

Basically, things which come from user input or a stream, etc.

EDIT #2:

I need to clarify that I'm looking for a solution that preserves the lazy nature of the IEnumerable - converting it to a list or an array can be an answer in certain circumstances, but isn't what I'm after here. (The real-world reason is that the number of items in the IEnumerable may be huge in my case, and it's important not to store them all in memory at once.)

回答1:

You don't need to complicate it. A regular foreach loop with a single extra bool variable will do the trick.

If you have

if(input.Any())
{
    A
    foreach(int i in input)
    {
        B
    }
    C
}

and you don't want to read input twice, you can change this to

bool seenItem = false;
foreach(int i in input)
{
    if (!seenItem)
    {
        seenItem = true;
        A
    }
    B
}
if (seenItem)
{
    C
}

Depending on what B does, you may be able to avoid the seenItem variable entirely.

In your case, Enumerable.Zip is a fairly basic function that is easily reimplemented, and your replacement function can use something similar to the above.

Edit: You might consider

public static class MyEnumerableExtensions
{
    public static IEnumerable<TFirst> NotReallyZip<TFirst, TSecond>(this IEnumerable<TFirst> first, IEnumerable<TSecond> second, Func<TFirst, TSecond, TFirst> resultSelector)
    {
        using (var firstEnumerator = first.GetEnumerator())
        using (var secondEnumerator = second.GetEnumerator())
        {
            if (secondEnumerator.MoveNext())
            {
                if (firstEnumerator.MoveNext())
                {
                    do yield return resultSelector(firstEnumerator.Current, secondEnumerator.Current);
                    while (firstEnumerator.MoveNext() && secondEnumerator.MoveNext());
                }
            }
            else
            {
                while (firstEnumerator.MoveNext())
                    yield return firstEnumerator.Current;
            }
        }
    }
}


回答2:

You could also just read the first element and if it's not null, concatenate this first element with the rest of your input:

var input = ReadNumbers();
var first = input.FirstOrDefault();
if (first != default(int)) //Assumes input doesn't contain zeroes
{
    var firstAsArray = new[] {first};
    foreach (int i in firstAsArray.Concat(input))
    {
        // Will include the first element.
        Console.WriteLine(i);
    }
}

For a normal enumerable, the first element would be repeated twice, but for a non-repeatable enumerable it would work, unless iterating twice is not allowed. Also, if you had such an enumerator:

private readonly static List<int?> Source = new List<int?>(){1,2,3,4,5,6};

private static IEnumerable<int?> ReadNumbers()
{
    while (Source.Count > 0) {
        yield return Source.ElementAt(0);
        Source.RemoveAt(0);
    }
}

Then it would print: 1, 1, 2, 3, 4, 5, 6. The reason being that the first element is consumed AFTER it has been returned. So the first enumerator, stopping at the first element, never has the chance of consuming that first element. But it would be a case of a badly written enumerator, here. If the element is consumed, then returned...

while (Source.Count > 0) {
    var returnElement = Source.ElementAt(0);
    Source.RemoveAt(0);
    yield return returnElement;
}

...you get the expected output of: 1, 2, 3, 4, 5, 6.



回答3:

This is not an efficient solution if the enumeration is long, however it is an easy solution:

var list = input.ToList();
if (list.Count != 0) {
    foreach (var item in list) {
       ...
    }
}