I´m writing a C# (.NET 4.5) application that is used to aggregate time based events for reporting purposes. To make my query logic reusable for both realtime and historical data I make use of the Reactive Extensions (2.0) and their IScheduler
infrastructure (HistoricalScheduler
and friends).
For example, assume we create a list of events (sorted chronologically, but they may coincide!) whose only payload ist their timestamp and want to know their distribution across buffers of a fixed duration:
const int num = 100000;
const int dist = 10;
var events = new List<DateTimeOffset>();
var curr = DateTimeOffset.Now;
var gap = new Random();
var time = new HistoricalScheduler(curr);
for (int i = 0; i < num; i++)
{
events.Add(curr);
curr += TimeSpan.FromMilliseconds(gap.Next(dist));
}
var stream = Observable.Generate<int, DateTimeOffset>(
0,
s => s < events.Count,
s => s + 1,
s => events[s],
s => events[s],
time);
stream.Buffer(TimeSpan.FromMilliseconds(num), time)
.Subscribe(l => Console.WriteLine(time.Now + ": " + l.Count));
time.AdvanceBy(TimeSpan.FromMilliseconds(num * dist));
Running this code results in a System.StackOverflowException
with the following stack trace (it´s the last 3 lines all the way down):
mscorlib.dll!System.Threading.Interlocked.Exchange<System.IDisposable>(ref System.IDisposable location1, System.IDisposable value) + 0x3d bytes
System.Reactive.Core.dll!System.Reactive.Disposables.SingleAssignmentDisposable.Dispose() + 0x37 bytes
System.Reactive.Core.dll!System.Reactive.Concurrency.ScheduledItem<System.DateTimeOffset>.Cancel() + 0x23 bytes
...
System.Reactive.Core.dll!System.Reactive.Disposables.AnonymousDisposable.Dispose() + 0x4d bytes
System.Reactive.Core.dll!System.Reactive.Disposables.SingleAssignmentDisposable.Dispose() + 0x4f bytes
System.Reactive.Core.dll!System.Reactive.Concurrency.ScheduledItem<System.DateTimeOffset>.Cancel() + 0x23 bytes
...
Ok, the problem seems to come from my use of Observable.Generate()
, depending on the list size (num
) and regardless of the choice of scheduler.
What am I doing wrong? Or more generally, what would be the preferred way to create an IObservable
from an IEnumerable
of events that provide their own timestamps?
OK, I´ve taken a different factory method that doesn´t require lamdba expressions as state transitions and now I don´t see any stack overflows anymore. I´m not yet sure if this would qualify as a correct answer to my question, but it works and I thought I´d share it here:
Manually scheduling the events before (!) returning the subscription seems awkward to me, but in this case it can be done inside the lambda expression.
If there is anything wrong about this approach, please correct me. Also, I´d still be happy to hear what implicit assumptions by
System.Reactive
I have violated with my original code.(Oh my, I should have checked that earlier: with RX v1.0, the original
Observable.Generate()
does in fact seem to work!)(update - realized I didn't provide an alternative: see at bottom of answer)
The problem is in how
Observable.Generate
works - it's used to unfold a corecursive (think recursion turned inside out) generator based on the arguments; if those arguments end up generating a very nested corecursive generator, you'll blow your stack.From this point on,
I'm speculating a lot (don't have the Rx source in front of me)(see below), but I'm willing to bet your definition ends up expanding into something like:And on and on until your call stack gets big enough to overflow. At, say, a method signature + your int counter, that'd be something like 8-16 bytes per recursive call (more depending on how the state machine generator is implemented), so 60,000 sounds about right (1M / 16 ~ 62500 maximum depth)
EDIT: Pulled up the source - confirmed: the "Run" method of Generate looks like this - take note of the nested calls to
Generate
:EDIT: Derp, didn't offer any alternatives...here's one that might work:
(EDIT: fixed
Enumerable.Range
, so stream size won´t be multiplied bychunkSize
)