I'm trying to wrap my head around transactions in event sourcing.
I have one aggregate (transaction scope) in my event store.
A command gets processed and is producing 10 events. Now, can this be handled as 1 transaction or is this 10 transactions? With transaction I mean changes to the state that is only valid together as a whole. Have I designed my events wrong if they are split up into many events like this even though I want them to be handled as a whole?
I tend to think that it is the command that is what is defining the transaction, the intent, and that all events produced by that command should be handled together as a whole. Meaning that they should only be persisted as a whole, loaded as a whole, visible to readers as a whole (atomically) and also only sent to listeners on my event bus as a whole.
Is this correct thinking?
How is this handled in for instance Kafka and Event Store?
And what about commands that produce many events, is that really good design? I want something to happen (command) and something happened (event), not many things happened?? I'd like to have this 1:1-relationship but I read here and there that commands should be able to produce many events, but why?
Sorry for the rambling, I hope somebody get what I'm trying to ask here.
A command gets processed and is producing 10 events. Now, can this be handled as 1 transaction or is this 10 transactions?
As a write, this is normally modeled as a single transaction; either the entire commit is added to the history, or nothing is.
that they should only be persisted as a whole, loaded as a whole, visible to readers as a whole (atomically) and also only sent to listeners on my event bus as a whole.
The read side of things can be a bit trickier. After all, events are just events; as a consumer, I may not even be interested in all of them, and there may be business value in consuming them as quickly as possible rather than waiting for everything to show up in order.
For consumers where the ordering is significant, in those cases you'll be reading the stream, rather than events. But it's still the case that you may have batching/paging concerns in the consumer that conflict with the goal of aligning all work on a commit boundary.
A thing to keep in mind is that, from the point of view of the readers, there are no invariants to maintain. The event stream is just a sequence of things that happened.
The only really critical case is that of a writer, trying to load the aggregate state; in which case you need the entire stream, and the commit boundaries are basically irrelevant.
How is this handled in for instance Kafka and Event Store?
In Greg Young's Event Store, writing to a stream means copying to the specified location in the stream an ordered collection of events. The whole block comes in, or not at all.
Reading from a stream includes paging support -- the client is allowed to ask for a run of events that fall between commit boundaries. The documentation offers no guarantees of what happens in that case. As it happens, the representation returned can support returning fewer events than are available, so it could be that the events are always returned on commit boundaries.
My understanding from reading the source code is that the persistence structure used to store the streams on disk does not attempt to preserve commit boundaries - but I could certainly be mistaken on this point.
I'd like to have this 1:1-relationship but I read here and there that commands should be able to produce many events, but why?
There are a couple reasons.
First, aggregates are artificial; they are the consistency boundaries that we use to ensure the integrity of the data in the journal. But a given aggregate can compose many entities (for instance, at low contention levels there's nothing inherently wrong about putting your entire domain model into a single "aggregate"), and it is often useful to treat changes to different entities as distinct from one another. (The alternative is analogous to writing a SomethingChanged event every time, and insisting that all clients consume the event to find out what happened).
Second, re-establishing the domain invariant is frequently a separate action from the action specified in the command. Bob just withdrew more cash from his account than was available to him; we need to update his account ledger and escalate the problem to a human being. Oh, here's a new command describing a deposit Bob made earlier in the day, we need to update his account ledger and tell the human being to stand down.
But broadly, because distinguishing the multiple consequences of a command better aligns with the natural language of the business.