Split a collection into `n` parts with LINQ?

2018-12-31 19:14发布

Is there a nice way to split a collection into n parts with LINQ? Not necessarily evenly of course.

That is, I want to divide the collection into sub-collections, which each contains a subset of the elements, where the last collection can be ragged.

19条回答
浪荡孟婆
2楼-- · 2018-12-31 19:20

EDIT: Okay, it looks like I misread the question. I read it as "pieces of length n" rather than "n pieces". Doh! Considering deleting answer...

(Original answer)

I don't believe there's a built-in way of partitioning, although I intend to write one in my set of additions to LINQ to Objects. Marc Gravell has an implementation here although I would probably modify it to return a read-only view:

public static IEnumerable<IEnumerable<T>> Partition<T>
    (this IEnumerable<T> source, int size)
{
    T[] array = null;
    int count = 0;
    foreach (T item in source)
    {
        if (array == null)
        {
            array = new T[size];
        }
        array[count] = item;
        count++;
        if (count == size)
        {
            yield return new ReadOnlyCollection<T>(array);
            array = null;
            count = 0;
        }
    }
    if (array != null)
    {             
        Array.Resize(ref array, count);
        yield return new ReadOnlyCollection<T>(array);
    }
}
查看更多
无与为乐者.
3楼-- · 2018-12-31 19:20

I use this:

public static IEnumerable<IEnumerable<T>> Partition<T>(this IEnumerable<T> instance, int partitionSize)
{
    return instance
        .Select((value, index) => new { Index = index, Value = value })
        .GroupBy(i => i.Index / partitionSize)
        .Select(i => i.Select(i2 => i2.Value));
}
查看更多
萌妹纸的霸气范
4楼-- · 2018-12-31 19:22

This is memory efficient and defers execution as much as possible (per batch) and operates in linear time O(n)

    public static IEnumerable<IEnumerable<T>> InBatchesOf<T>(this IEnumerable<T> items, int batchSize)
    {
        List<T> batch = new List<T>(batchSize);
        foreach (var item in items)
        {
            batch.Add(item);

            if (batch.Count >= batchSize)
            {
                yield return batch;
                batch = new List<T>();
            }
        }

        if (batch.Count != 0)
        {
            //can't be batch size or would've yielded above
            batch.TrimExcess();
            yield return batch;
        }
    }
查看更多
闭嘴吧你
5楼-- · 2018-12-31 19:24

Just came across this thread, and most of the solutions here involve adding items to collections, effectively materialising each page before returning it. This is bad for two reasons - firstly if your pages are large there's a memory overhead to filling the page, secondly there are iterators which invalidate previous records when you advance to the next one (for example if you wrap a DataReader within an enumerator method).

This solution uses two nested enumerator methods to avoid any need to cache items into temporary collections. Since the outer and inner iterators are traversing the same enumerable, they necessarily share the same enumerator, so it's important not to advance the outer one until you're done with processing the current page. That said, if you decide not to iterate all the way through the current page, when you move to the next page this solution will iterate forward to the page boundary automatically.

using System.Collections.Generic;

public static class EnumerableExtensions
{
    /// <summary>
    /// Partitions an enumerable into individual pages of a specified size, still scanning the source enumerable just once
    /// </summary>
    /// <typeparam name="T">The element type</typeparam>
    /// <param name="enumerable">The source enumerable</param>
    /// <param name="pageSize">The number of elements to return in each page</param>
    /// <returns></returns>
    public static IEnumerable<IEnumerable<T>> Partition<T>(this IEnumerable<T> enumerable, int pageSize)
    {
        var enumerator = enumerable.GetEnumerator();

        while (enumerator.MoveNext())
        {
            var indexWithinPage = new IntByRef { Value = 0 };

            yield return SubPartition(enumerator, pageSize, indexWithinPage);

            // Continue iterating through any remaining items in the page, to align with the start of the next page
            for (; indexWithinPage.Value < pageSize; indexWithinPage.Value++)
            {
                if (!enumerator.MoveNext())
                {
                    yield break;
                }
            }
        }
    }

    private static IEnumerable<T> SubPartition<T>(IEnumerator<T> enumerator, int pageSize, IntByRef index)
    {
        for (; index.Value < pageSize; index.Value++)
        {
            yield return enumerator.Current;

            if (!enumerator.MoveNext())
            {
                yield break;
            }
        }
    }

    private class IntByRef
    {
        public int Value { get; set; }
    }
}
查看更多
后来的你喜欢了谁
6楼-- · 2018-12-31 19:28

Interesting thread. To get a streaming version of Split/Partition, one can use enumerators and yield sequences from the enumerator using extension methods. Converting imperative code to functional code using yield is a very powerful technique indeed.

First an enumerator extension that turns a count of elements into a lazy sequence:

public static IEnumerable<T> TakeFromCurrent<T>(this IEnumerator<T> enumerator, int count)
{
    while (count > 0)
    {
        yield return enumerator.Current;
        if (--count > 0 && !enumerator.MoveNext()) yield break;
    }
}

And then an enumerable extension that partitions a sequence:

public static IEnumerable<IEnumerable<T>> Partition<T>(this IEnumerable<T> seq, int partitionSize)
{
    var enumerator = seq.GetEnumerator();

    while (enumerator.MoveNext())
    {
        yield return enumerator.TakeFromCurrent(partitionSize);
    }
}

The end result is a highly efficient, streaming and lazy implementation that relies on very simple code.

Enjoy!

查看更多
初与友歌
7楼-- · 2018-12-31 19:29

There are lots of great answers for this question (and its cousins). I needed this myself and had created a solution that is designed to be efficient and error tolerant in a scenario where the source collection can be treated as a list. It does not use any lazy iteration so it may not be suitable for collections of unknown size that may apply memory pressure.

static public IList<T[]> GetChunks<T>(this IEnumerable<T> source, int batchsize)
{
    IList<T[]> result = null;
    if (source != null && batchsize > 0)
    {
        var list = source as List<T> ?? source.ToList();
        if (list.Count > 0)
        {
            result = new List<T[]>();
            for (var index = 0; index < list.Count; index += batchsize)
            {
                var rangesize = Math.Min(batchsize, list.Count - index);
                result.Add(list.GetRange(index, rangesize).ToArray());
            }
        }
    }
    return result ?? Enumerable.Empty<T[]>().ToList();
}

static public void TestGetChunks()
{
    var ids = Enumerable.Range(1, 163).Select(i => i.ToString());
    foreach (var chunk in ids.GetChunks(20))
    {
        Console.WriteLine("[{0}]", String.Join(",", chunk));
    }
}

I have seen a few answers across this family of questions that use GetRange and Math.Min. But I believe that overall this is a more complete solution in terms of error checking and efficiency.

查看更多
登录 后发表回答