I'm trying to select a subgroup of a list where items have contiguous dates, e.g.
ID StaffID Title ActivityDate
-- ------- ----------------- ------------
1 41 Meeting with John 03/06/2010
2 41 Meeting with John 08/06/2010
3 41 Meeting Continues 09/06/2010
4 41 Meeting Continues 10/06/2010
5 41 Meeting with Kay 14/06/2010
6 41 Meeting Continues 15/06/2010
I'm using a pivot point each time, so take the example pivot item as 3, I'd like to get the following resulting contiguous events around the pivot:
ID StaffID Title ActivityDate
-- ------- ----------------- ------------
2 41 Meeting with John 08/06/2010
3 41 Meeting Continues 09/06/2010
4 41 Meeting Continues 10/06/2010
My current implementation is a laborious "walk" into the past, then into the future, to build the list:
var activity = // item number 3: Meeting Continues (09/06/2010)
var orderedEvents = activities.OrderBy(a => a.ActivityDate).ToArray();
// Walk into the past until a gap is found
var preceedingEvents = orderedEvents.TakeWhile(a => a.ID != activity.ID);
DateTime dayBefore;
var previousEvent = activity;
while (previousEvent != null)
{
dayBefore = previousEvent.ActivityDate.AddDays(-1).Date;
previousEvent = preceedingEvents.TakeWhile(a => a.ID != previousEvent.ID).LastOrDefault();
if (previousEvent != null)
{
if (previousEvent.ActivityDate.Date == dayBefore)
relatedActivities.Insert(0, previousEvent);
else
previousEvent = null;
}
}
// Walk into the future until a gap is found
var followingEvents = orderedEvents.SkipWhile(a => a.ID != activity.ID);
DateTime dayAfter;
var nextEvent = activity;
while (nextEvent != null)
{
dayAfter = nextEvent.ActivityDate.AddDays(1).Date;
nextEvent = followingEvents.SkipWhile(a => a.ID != nextEvent.ID).Skip(1).FirstOrDefault();
if (nextEvent != null)
{
if (nextEvent.ActivityDate.Date == dayAfter)
relatedActivities.Add(nextEvent);
else
nextEvent = null;
}
}
The list relatedActivities
should then contain the contiguous events, in order.
Is there a better way (maybe using LINQ) for this?
I had an idea of using .Aggregate()
but couldn't think how to get the aggregate to break out when it finds a gap in the sequence.
In this case I think that a standard foreach
loop is probably more readable than a LINQ query:
var relatedActivities = new List<TActivity>();
bool found = false;
foreach (var item in activities.OrderBy(a => a.ActivityDate))
{
int count = relatedActivities.Count;
if ((count > 0) && (relatedActivities[count - 1].ActivityDate.Date.AddDays(1) != item.ActivityDate.Date))
{
if (found)
break;
relatedActivities.Clear();
}
relatedActivities.Add(item);
if (item.ID == activity.ID)
found = true;
}
if (!found)
relatedActivities.Clear();
For what it's worth, here's a roughly equivalent -- and far less readable -- LINQ query:
var relatedActivities = activities
.OrderBy(x => x.ActivityDate)
.Aggregate
(
new { List = new List<TActivity>(), Found = false, ShortCircuit = false },
(a, x) =>
{
if (a.ShortCircuit)
return a;
int count = a.List.Count;
if ((count > 0) && (a.List[count - 1].ActivityDate.Date.AddDays(1) != x.ActivityDate.Date))
{
if (a.Found)
return new { a.List, a.Found, ShortCircuit = true };
a.List.Clear();
}
a.List.Add(x);
return new { a.List, Found = a.Found || (x.ID == activity.ID), a.ShortCircuit };
},
a => a.Found ? a.List : new List<TActivity>()
);
Here's an implementation:
public static IEnumerable<IGrouping<int, T>> GroupByContiguous(
this IEnumerable<T> source,
Func<T, int> keySelector
)
{
int keyGroup = Int32.MinValue;
int currentGroupValue = Int32.MinValue;
return source
.Select(t => new {obj = t, key = keySelector(t))
.OrderBy(x => x.key)
.GroupBy(x => {
if (currentGroupValue + 1 < x.key)
{
keyGroup = x.key;
}
currentGroupValue = x.key;
return keyGroup;
}, x => x.obj);
}
You can either convert the dates to ints by means of subtraction, or imagine a DateTime version (easily).
Somehow, I don't think LINQ was truly meant to be used for bidirectional-one-dimensional-depth-first-searches, but I constructed a working LINQ using Aggregate. For this example I'm going to use a List instead of an array. Also, I'm going to use Activity
to refer to whatever class you are storing the data in. Replace it with whatever is appropriate for your code.
Before we even start, we need a small function to handle something. List.Add(T)
returns null, but we want to be able to accumulate in a list and return the new list for this aggregate function. So all you need is a simple function like the following.
private List<T> ListWithAdd<T>(List<T> src, T obj)
{
src.Add(obj);
return src;
}
First, we get the sorted list of all activities, and then initialize the list of related activities. This initial list will contain the target activity only, to start.
List<Activity> orderedEvents = activities.OrderBy(a => a.ActivityDate).ToList();
List<Activity> relatedActivities = new List<Activity>();
relatedActivities.Add(activity);
We have to break this into two lists, the past and the future just like you currently do it.
We'll start with the past, the construction should look mostly familiar. Then we'll aggregate all of it into relatedActivities. This uses the ListWithAdd
function we wrote earlier. You could condense it into one line and skip declaring previousEvents as its own variable, but I kept it separate for this example.
var previousEvents = orderedEvents.TakeWhile(a => a.ID != activity.ID).Reverse();
relatedActivities = previousEvents.Aggregate<Activity, List<Activity>>(relatedActivities, (items, prevItem) => items.OrderBy(a => a.ActivityDate).First().ActivityDate.Subtract(prevItem.ActivityDate).Days.Equals(1) ? ListWithAdd(items, prevItem) : items).ToList();
Next, we'll build the following events in a similar fashion, and likewise aggregate it.
var nextEvents = orderedEvents.SkipWhile(a => a.ID != activity.ID);
relatedActivities = nextEvents.Aggregate<Activity, List<Activity>>(relatedActivities, (items, nextItem) => nextItem.ActivityDate.Subtract(items.OrderBy(a => a.ActivityDate).Last().ActivityDate).Days.Equals(1) ? ListWithAdd(items, nextItem) : items).ToList();
You can properly sort the result afterwards, as now relatedActivities should contain all activities with no gaps. It won't immediately break when it hits the first gap, no, but I don't think you can literally break out of a LINQ. So it instead just ignores anything which it finds past a gap.
Note that this example code only operates on the actual difference in time. Your example output seems to imply that you need some other comparison factors, but this should be enough to get you started. Just add the necessary logic to the date subtraction comparison in both entries.