When would one choose to use Rx over TPL or are the 2 frameworks orthogonal?
From what I understand Rx is primarily intended to provide an abstraction over events and allow composition but it also allows for providing an abstraction over async operations.
using the Createxx overloads and the Fromxxx overloads and cancellation via disposing the IDisposable returned.
TPL also provides an abstraction for operations via Task and cancellation abilities.
My dilemma is when to use which and for what scenarios?
The main purpose of Rx is not to provide an abstraction over events. This is just one of its outcomes. Its primary purpose is to provide a composable push model for collections.
The reactive framework (Rx) is based on IObservable<T>
being the mathematical dual of IEnumerable<T>
. So rather than "pull" items from a collection using IEnumerable<T>
we can have objects "pushed" to us via IObservable<T>
.
Of course, when we actually go looking for observable sources things like events & async operations are excellent candidates.
The reactive framework naturally requires a multi-threaded model to be able to watch the sources of observable data and to manage queries and subscriptions. Rx actually makes heavy use of the TPL to do this.
So if you use Rx you are implicitly using the TPL.
You would use the TPL directly if you wish direct control over your tasks.
But if you have sources of data that you wish to observe and perform queries against then I thoroughly recommend the reactive framework.
Some guidelines I like to follow:
- Am I dealing with data that I don't originate. Data which arrives when it pleases? Then RX.
- Am I originating computations and need to manage concurrency? Then TPL.
- Am I managing multiple results, and need to choose from them based on time? Then RX.
I like Scott W's bullet points. To put some more concrete examples in
Rx maps really nicely to
- consuming streams
- performing non-blocking async work like web requests.
- streaming events (either .net events like mouse movement OR Service Bus message type events)
- Composing "streams" of events together
- Linq style operations
- Exposing data streams from your public API
TPL seems to map nicely to
- internal parallelisation of work
- performing non-blocking async work like web requests
- performing work flows and continuations
One thing I have noticed with IObservable (Rx) is that it becomes pervasive. Once in your code base, as it will no doubt be exposed via other interfaces, it will eventually appear all over your application. I imagine this may be scary at first but most of the team is pretty comfortable with Rx now and love the amount of work it saves us.
IMHO Rx will be the dominant library over the TPL as it already is supported in .NET 3.5, 4.0, Silverlight 3, Silverlight 4 and Javascript. This means you effectively have to learn one style and it is applicable to many platforms.
EDIT: I have changed my mind about Rx being dominant over TPL. They solve different problems so should not really be compared like that. With .NET 4.5/C# 5.0 the async/await keywords will further tie us to the TPL (which is good). For a deep discuson on Rx vs events vs TPL etc. check out the first chapter of my online book IntroToRx.com
Update, December 2016: If you have 30 minutes, I recommend you read Joe Duffy's first-hand account instead of my speculation. I think my analysis holds up well, but if you've found this question I highly recommend you see the blog post instead of these answers because in addition to TPL vs Rx.NET he also covers MS research projects (Midori, Cosmos).
http://joeduffyblog.com/2016/11/30/15-years-of-concurrency/
I think MS made a big mistake over-correcting after .NET 2.0 came out. They introduced many different concurrency management APIs all at the same time from different parts of the company.
- Steven Toub was pushing hard for thread-safe primitives to replace Event (which started as
Future<T>
and turned into Task<T>
)
- MS Research had MIN-LINQ and Reactive Extensions (Rx)
- Hardware/Embedded had robotics cuntime (CCR)
In the meantime many managed API teams were trying to live with APM and Threadpool.QueueUserWorkItem()
, not knowing if Toub would win his fight to ship Future<T>
/Task<T>
in mscorlib.dll. In the end it looks like they hedged, and shipped both Task<T>
and IObservable<T>
in mscorlib, but didn't allow any other Rx APIs (not even ISubject<T>
) in mscorlib. I think this hedge ended up causing a huge amount of duplication (more later) and wasted effort inside and outside the company.
For duplication see: Task
vs. IObservable<Unit>
, Task<T>
vs. AsyncSubject<T>
, Task.Run()
vs. Observable.Start()
. And this is just the tip of the iceberg. But at a higher level consider:
- StreamInsight - SQL event streams, native-code-optimized, but event queries defined using LINQ syntax
- TPL Dataflow - built on TPL, built in parallel to Rx, optimized for tweaking threading parallelism, not good at composing queries
- Rx - Amazing expressiveness, but fraught with peril. Mixes 'hot' streams with
IEnumerable
-style extension methods, which means you very easily block forever (calling First()
on a hot stream never returns). Scheduling limits (limiting parallelism) is done via rather odd SubscribeOn()
extension methods, which are weirdly implicit and hard to get right. If starting to learn Rx reserve a long time to learn all the pitfalls to avoid. But Rx is really the only option if composing complex event streams or you need complex filtering/querying.
I don't think Rx has a fighting chance at wide adoption until MS ships ISubject<T>
in mscorlib. Which is sad, because Rx contains some very useful concrete (generic) types, like TimeInterval<T>
and Timestamped<T>
, which I think should be in Core/mscorlib like Nullable<T>
. Also, System.Reactive.EventPattern<TEventArgs>
.
I would say that TPL Dataflow covers specialized subset of functionality in Rx. Dataflow is for data processing which can take measurable amount of time, whereas Rx is for events, such as mouse position, error states, etc where handling time is negligible.
Example: your "subscribe" handler is asynchronous and you want no more than 1 executor at the time. With Rx you have to block, there is no other way around it, because Rx is async-agnostic and does not threat async in a special way in many places.
.Subscribe(myAsyncHandler().Result)
If you don't block, then Rx will consider that action is complete while handler is still being executed asynchronously.
You might think that if you do
.ObserveOn(Scheduler.EventLoopSchedule)
than problem is solved. But this will break your .Complete() workflow, because Rx will think that it is done as soon as it schedule execution and you will quit your application without waiting for async operation completion.
If you want to allow no more than 4 concurrent async tasks than Rx does not offer anything out of the box. Perhaps you can hack something by implementing your own scheduler, buffer, etc.
TPL Dataflow offer very nice solution in ActionBlock. It can throttle simultaneous actions to certain number and it does understand async operations, so calling Complete() and awaiting for Completed will do exactly what you would expect: waiting for all in-progress async tasks to complete.
Another feature TPL has is "backpressure". Let's say you discovered an error in your handling routine and need to recalc last month data. If you subscribe to your source using Rx, and your pipeline contains unbounded buffers, or ObserveOn, than you will run out of memory in matter of seconds because source will keep reading faster than processing can handle. Even if you implement blocking consumer, your source may suffer from blocking calls, for example if source is asynchronous. In TPL you can implement source as
while(...)
await actionBlock.SendAsync(msg)
which does not block source yet will await while handler is overloaded.
Overall, I found that Rx is good fit for actions which are time and computationally light. If processing time becomes substantial, you are in the world of strange side effects and esoteric debugging.
Good news is that TPL Dataflow blocks play very nice with Rx. They have AsObserver/AsObservable adapters and you can stick them in the middle of Rx pipeline when needed. But Rx has much more patterns and use cases. So my rule of thumb is to start with Rx and add TPL Dataflow as needed.