Observable to batch like Lmax Disruptor

2019-07-20 17:04发布

Those who are familiar with lmax ring buffer (disruptor) know that one of the biggest advanatages of that data structure is that it batches incomming events and when we have a consumer that can take advantage of batching that makes the system automatically adjustable to the load, the more events you throw at it the better.

I wonder couldnt we achieve the same effect with an Observable (targeting the batching feature). I've tried out Observable.buffer but this is very different, buffer will wait and not emit the batch while the expected number of events didnt arrive. what we want is quite different.

given the subsriber is waiting for a batch from Observable<Collection<Event>>, when a single item arrives at stream it emits a single element batch which gets processed by subscriber, while it is processing other elements are arriving and getting collected into next batch, as soon as subscriber finishes with the execution it gets the next batch with as many events as had arrived since it started last processing...

So as a result if our subscriber is fast enough to process one event at a time it will do so, if load gets higher it will still have the same frequency of processing but more events each time (thus solving backpressure problem)... unlike buffer which will stick and wait for batch to fill up.

Any suggestions? or shall i go with ring buffer?

1条回答
Animai°情兽
2楼-- · 2019-07-20 17:37

RxJava and Disruptor represent two different programming approaches.

I'm not experienced with Disruptor but based on video talks, it is basically a large buffer where producer emit data like a firehose and consumers spin/yield/block until data is available.

RxJava, on the other hand, aims at non-blocking event delivery. We too have ringbuffers, notably in observeOn which acts as the async-boundary between producers and consumers, but these are much smaller and we avoid buffer overflows and buffer bloat by applying the co-routines approach. Co-routines boil down to callbacks sent to your callbacks so yo can callback our callbacks to send you some data at your pace. The frequency of such requests determines the pacing.

There are data sources that don't support such co-op streaming and require one of the onBackpressureXXX operators that will buffer/drop values if the downstream doesn't request fast enough.

If you think you can process data in batches more efficiently than one-by-one, you can use the buffer operator which has overloads to specify time duration for the buffers: you can have, for example, 10 ms worth of data, independent of how many values arrive in this duration.

Controlling the batch-size via request frequency is tricky and may have unforseen consequences. The problem, generally, is that if you request(n) from a batching source, you indicate you can process n elements but the source now has to create n buffers of size 1 (because the type is Observable<List<T>>). In contrast, if no request is called, the operator buffers the data resulting in longer buffers. These behaviors introduce extra overhead in the processing if you really could keep up and also has to turn the cold source into a firehose (because otherwise what you have is essentially buffer(1)) which itself can now lead to buffer bloat.

查看更多
登录 后发表回答