-->

LINQ: Group by index and value [duplicate]

2019-09-20 19:38发布

问题:

This question already has an answer here:

  • linq group by contiguous blocks 5 answers

Lets say I have an list of strings with the following values:

["a","a","b","a","a","a","c","c"]

I want to execute a linq query that will group into 4 groups:

Group 1: ["a","a"] Group 2: ["b"] Group 3: ["a","a","a"] Group 4: ["c","c"]

Basically I want to create 2 different groups for the value "a" because they are not coming from the same "index sequence".

Anyone has a LINQ solution for this?

回答1:

You just need key other than items of array

var x = new string[] { "a", "a", "a", "b", "a", "a", "c" };


int groupId = -1;
var result = x.Select((s, i) => new
{
    value = s,
    groupId = (i > 0 && x[i - 1] == s) ? groupId : ++groupId
}).GroupBy(u => new { groupId });


foreach (var item in result)
{
    Console.WriteLine(item.Key);
    foreach (var inner in item)
    {
        Console.WriteLine(" => " + inner.value);
    }
}

Here is the result: Link



回答2:

Calculate the "index sequence" first, then do your group.

private class IndexedData
{
    public int Sequence;
    public string Text;
} 

string[] data = [ "a", "a", "b" ... ]

// Calculate "index sequence" for each data element.
List<IndexedData> indexes = new List<IndexedData>();

foreach (string s in data)
{
    IndexedData last = indexes.LastOrDefault() ?? new IndexedData();

    indexes.Add(new IndexedData
    {
        Text = s,
        Sequence = (last.Text == s
                      ? last.Sequence 
                      : last.Sequence + 1)
    });
}

// Group by "index sequence"
var grouped = indexes.GroupBy(i => i.Sequence)
                     .Select(g => g.Select(i => i.Text));


回答3:

This is a naive foreach implementation where whole dataset ends up in memory (probably not an issue for you since you do GroupBy):

public static IEnumerable<List<string>> Split(IEnumerable<string> values)
{
    var result = new List<List<string>>();
    foreach (var value in values)
    {
        var currentGroup = result.LastOrDefault();
        if (currentGroup?.FirstOrDefault()?.Equals(value) == true)
        {
            currentGroup.Add(value);
        }
        else
        {
            result.Add(new List<string> { value });
        }
    }

    return result;
}

Here comes a slightly complicated implementation with foreach and yield return enumerator state machine which keeps only current group in memory - this is probably how this would be implemented on framework level:

EDIT: This is apparently also the way MoreLINQ does it.

public static IEnumerable<List<string>> Split(IEnumerable<string> values)
{
    var currentValue = default(string);
    var group = (List<string>)null;

    foreach (var value in values)
    {
        if (group == null)
        {
            currentValue = value;
            group = new List<string> { value };
        }
        else if (currentValue.Equals(value))
        {
            group.Add(value);
        }
        else
        {
            yield return group;
            currentValue = value;
            group = new List<string> { value };
        }
    }

    if (group != null)
    {
        yield return group;
    }
}

And this is a joke version using LINQ only, it is basically the same as the first one but is slightly harder to understand (especially since Aggregate is not the most frequently used LINQ method):

public static IEnumerable<List<string>> Split(IEnumerable<string> values)
{
    return values.Aggregate(
        new List<List<string>>(),
        (lists, str) =>
        {
            var currentGroup = lists.LastOrDefault();
            if (currentGroup?.FirstOrDefault()?.Equals(str) == true)
            {
                currentGroup.Add(str);
            }
            else
            {
                lists.Add(new List<string> { str });
            }

            return lists;
        },
        lists => lists);
}


回答4:

Using an extension method based on the APL scan operator, that is like Aggregate but returns intermediate results paired with source values:

public static IEnumerable<KeyValuePair<TKey, T>> ScanPair<T, TKey>(this IEnumerable<T> src, TKey seedKey, Func<KeyValuePair<TKey, T>, T, TKey> combine) {
    using (var srce = src.GetEnumerator()) {
        if (srce.MoveNext()) {
            var prevkv = new KeyValuePair<TKey, T>(seedKey, srce.Current);

            while (srce.MoveNext()) {
                yield return prevkv;
                prevkv = new KeyValuePair<TKey, T>(combine(prevkv, srce.Current), srce.Current);
            }
            yield return prevkv;
        }
    }
}

You can create extension methods for grouping by consistent runs:

public static IEnumerable<IGrouping<int, TResult>> GroupByRuns<TElement, TKey, TResult>(this IEnumerable<TElement> src, Func<TElement, TKey> key, Func<TElement, TResult> result, IEqualityComparer<TKey> cmp = null) {
    cmp = cmp ?? EqualityComparer<TKey>.Default;
    return src.ScanPair(0,
                        (kvp, cur) => cmp.Equals(key(kvp.Value), key(cur)) ? kvp.Key : kvp.Key + 1)
              .GroupBy(kvp => kvp.Key, kvp => result(kvp.Value));
}

public static IEnumerable<IGrouping<int, TElement>> GroupByRuns<TElement, TKey>(this IEnumerable<TElement> src, Func<TElement, TKey> key) => src.GroupByRuns(key, e => e);
public static IEnumerable<IGrouping<int, TElement>> GroupByRuns<TElement>(this IEnumerable<TElement> src) => src.GroupByRuns(e => e, e => e);

public static IEnumerable<IEnumerable<TResult>> Runs<TElement, TKey, TResult>(this IEnumerable<TElement> src, Func<TElement, TKey> key, Func<TElement, TResult> result, IEqualityComparer<TKey> cmp = null) =>
    src.GroupByRuns(key, result).Select(g => g.Select(s => s));
public static IEnumerable<IEnumerable<TElement>> Runs<TElement, TKey>(this IEnumerable<TElement> src, Func<TElement, TKey> key) => src.Runs(key, e => e);
public static IEnumerable<IEnumerable<TElement>> Runs<TElement>(this IEnumerable<TElement> src) => src.Runs(e => e, e => e);

And using the simplest version, you can get either an IEnumerable<IGrouping>>:

var ans1 = src.GroupByRuns();

Or a version that dumps the IGrouping (and its Key) for an IEnumerable:

var ans2 = src.Runs();


标签: c# linq c#-7.2