Why does Observable.Generate() throw System.StackO

2019-04-06 03:45发布

I´m writing a C# (.NET 4.5) application that is used to aggregate time based events for reporting purposes. To make my query logic reusable for both realtime and historical data I make use of the Reactive Extensions (2.0) and their IScheduler infrastructure (HistoricalScheduler and friends).

For example, assume we create a list of events (sorted chronologically, but they may coincide!) whose only payload ist their timestamp and want to know their distribution across buffers of a fixed duration:

const int num = 100000;
const int dist = 10;

var events = new List<DateTimeOffset>();
var curr = DateTimeOffset.Now;
var gap = new Random();

var time = new HistoricalScheduler(curr);

for (int i = 0; i < num; i++)
{
    events.Add(curr);
    curr += TimeSpan.FromMilliseconds(gap.Next(dist));
}

var stream = Observable.Generate<int, DateTimeOffset>(
    0,
    s => s < events.Count,
    s => s + 1,
    s => events[s],
    s => events[s],
    time);

stream.Buffer(TimeSpan.FromMilliseconds(num), time)
    .Subscribe(l => Console.WriteLine(time.Now + ": " + l.Count));

time.AdvanceBy(TimeSpan.FromMilliseconds(num * dist));

Running this code results in a System.StackOverflowException with the following stack trace (it´s the last 3 lines all the way down):

mscorlib.dll!System.Threading.Interlocked.Exchange<System.IDisposable>(ref System.IDisposable location1, System.IDisposable value) + 0x3d bytes    
System.Reactive.Core.dll!System.Reactive.Disposables.SingleAssignmentDisposable.Dispose() + 0x37 bytes    
System.Reactive.Core.dll!System.Reactive.Concurrency.ScheduledItem<System.DateTimeOffset>.Cancel() + 0x23 bytes    
...
System.Reactive.Core.dll!System.Reactive.Disposables.AnonymousDisposable.Dispose() + 0x4d bytes    
System.Reactive.Core.dll!System.Reactive.Disposables.SingleAssignmentDisposable.Dispose() + 0x4f bytes    
System.Reactive.Core.dll!System.Reactive.Concurrency.ScheduledItem<System.DateTimeOffset>.Cancel() + 0x23 bytes    
...

Ok, the problem seems to come from my use of Observable.Generate(), depending on the list size (num) and regardless of the choice of scheduler.

What am I doing wrong? Or more generally, what would be the preferred way to create an IObservable from an IEnumerable of events that provide their own timestamps?

2条回答
三岁会撩人
2楼-- · 2019-04-06 04:07

OK, I´ve taken a different factory method that doesn´t require lamdba expressions as state transitions and now I don´t see any stack overflows anymore. I´m not yet sure if this would qualify as a correct answer to my question, but it works and I thought I´d share it here:

var stream = Observable.Create<DateTimeOffset>(o =>
    {
        foreach (var e in events)
        {
            time.Schedule(e, () => o.OnNext(e));
        }

        time.Schedule(events[events.Count - 1], () => o.OnCompleted());

        return Disposable.Empty;
    });

Manually scheduling the events before (!) returning the subscription seems awkward to me, but in this case it can be done inside the lambda expression.

If there is anything wrong about this approach, please correct me. Also, I´d still be happy to hear what implicit assumptions by System.Reactive I have violated with my original code.

(Oh my, I should have checked that earlier: with RX v1.0, the original Observable.Generate() does in fact seem to work!)

查看更多
在下西门庆
3楼-- · 2019-04-06 04:30

(update - realized I didn't provide an alternative: see at bottom of answer)

The problem is in how Observable.Generate works - it's used to unfold a corecursive (think recursion turned inside out) generator based on the arguments; if those arguments end up generating a very nested corecursive generator, you'll blow your stack.

From this point on, I'm speculating a lot (don't have the Rx source in front of me) (see below), but I'm willing to bet your definition ends up expanding into something like:

initial_state =>
generate_next(initial_state) => 
generate_next(generate_next(initial_state)) => 
generate_next(generate_next(generate_next(initial_state))) =>
generate_next(generate_next(generate_next(generate_next(initial_state)))) => ...

And on and on until your call stack gets big enough to overflow. At, say, a method signature + your int counter, that'd be something like 8-16 bytes per recursive call (more depending on how the state machine generator is implemented), so 60,000 sounds about right (1M / 16 ~ 62500 maximum depth)

EDIT: Pulled up the source - confirmed: the "Run" method of Generate looks like this - take note of the nested calls to Generate:

protected override IDisposable Run(
    IObserver<TResult> observer, 
    IDisposable cancel, 
    Action<IDisposable> setSink)
{
    if (this._timeSelectorA != null)
    {
        Generate<TState, TResult>.α α = 
                new Generate<TState, TResult>.α(
                     (Generate<TState, TResult>) this, 
                     observer, 
                     cancel);
        setSink(α);
        return α.Run();
    }
    if (this._timeSelectorR != null)
    {
        Generate<TState, TResult>.δ δ = 
               new Generate<TState, TResult>.δ(
                   (Generate<TState, TResult>) this, 
                   observer, 
                   cancel);
        setSink(δ);
        return δ.Run();
    }
    Generate<TState, TResult>._ _ = 
             new Generate<TState, TResult>._(
                  (Generate<TState, TResult>) this, 
                  observer, 
                  cancel);
    setSink(_);
    return _.Run();
}

EDIT: Derp, didn't offer any alternatives...here's one that might work:

(EDIT: fixed Enumerable.Range, so stream size won´t be multiplied by chunkSize)

const int num = 160000;
const int dist = 10;

var events = new List<DateTimeOffset>();
var curr = DateTimeOffset.Now;
var gap = new Random();
var time = new HistoricalScheduler(curr);

for (int i = 0; i < num; i++)
{
    events.Add(curr);
    curr += TimeSpan.FromMilliseconds(gap.Next(dist));
}

    // Size too big? Fine, we'll chunk it up!
const int chunkSize = 10000;
var numberOfChunks = events.Count / chunkSize;

    // Generate a whole mess of streams based on start/end indices
var streams = 
    from chunkIndex in Enumerable.Range(0, (int)Math.Ceiling((double)events.Count / chunkSize) - 1)
    let startIdx = chunkIndex * chunkSize
    let endIdx = Math.Min(events.Count, startIdx + chunkSize)
    select Observable.Generate<int, DateTimeOffset>(
        startIdx,
        s => s < endIdx,
        s => s + 1,
        s => events[s],
        s => events[s],
        time);

    // E pluribus streamum
var stream = Observable.Concat(streams);

stream.Buffer(TimeSpan.FromMilliseconds(num), time)
    .Subscribe(l => Console.WriteLine(time.Now + ": " + l.Count));

time.AdvanceBy(TimeSpan.FromMilliseconds(num * dist));
查看更多
登录 后发表回答