Tl;dr: "How can I push a message through a bunch of asynchronous, unordered microservices and know when that message has made it through each of them?"
I'm struggling to find the right messaging system/protocol for a specific microservices architecture. This isn't a "which is best" question, but a question about what my options are for a design pattern/protocol.
- I have a message on the beginning queue. Let's say a RabbitMQ message with serialized JSON
- I need that message to go through an arbitrary number of microservices
- Each of those microservices are long running, must be independent, and may be implemented in a variety of languages
- The order of services the message goes through does not matter. In fact, it should not be synchronous.
- Each service can append data to the original message, but that data is ignored by the other services. There should be no merge conflicts (each service writes a unique key). No service will change or destroy data.
- Once all the services have had their turn, the message should be published to a second RabbitMQ queue with the original data and the new data.
- The microservices will have no other side-effects. If this were all in one monolithic application (and in the same language), functional programming would be perfect.
So, the question is, what is an appropriate way to manage that message through the various services? I don't want to have to do one at a time, and the order isn't important. But, if that's the case, how can the system know when all the services have had their whack and the final message can be written onto the ending queue (to have the next batch of services have their go).
The only, semi-elegant solution I could come up with was
- to have the first service that encounters a message write that message to common storage (say mongodb)
- Have each service do its thing, mark that it has completed for that message, and then check to see if all the services have had their turn
- If so, that last service would publish the message
But that still requires each service to be aware of all the other services and requires each service to leave its mark. Neither of those is desired.
I am open to a "Shepherd" service of some kind.
I would appreciate any options that I have missed, and am willing to concede that their may be a better, fundamental design.
Thank you.
There are two methods of managing a long running process (or a processing involving multiple microservices): Orchestration and choreography. There are a lot of articles describing them.
Long story short: In Orchestration you have a microservice that keeps track of the process status and in Choreography all the microservices know where to send next the message and/or when the process is done.
This article explains the benefits and tradeofs of the two styles.
Orchestration
Orchestration Benefits
- Provides a good way for controlling the flow of the application when there is synchronous processing. For example, if Service A needs to complete successfully before Service B is invoked.
Orchestration Tradeoffs
Couples the services together creating dependencies. If service A is down, service B and C will never be called.
If there is a central shared instance of the orchestrator for all requests, then the orchestrator is a single point of failure. If it goes down, all processing stops.
Leverages synchronous processing that blocks requests. In this example, the total end-to-end processing time is the sum of time it takes for Service A + Service B + Service C to be called.
Choreography
Choreography Benefits
Enables faster end-to-end processing as services can be executed in parallel/asynchronously.
Easier to add/update services as they can be plugged in/out of the event stream easily.
Aligns well with an agile delivery model as teams can focus on particular services instead of the entire application.
Control is distributed, so there is no longer a single orchestrator serving as a central point of failure.
Several patterns can be used with a reactive architecture to provide additional benefits. For example, Event Sourcing is when the Event Stream stores all of the events and enables event replay. This way, if a service went down while events were still being produced, when it came back online it could replay those events to catch back up. Also, Command Query Responsibility Segregation (CQRS) can be applied to separate out the read and write activities. This enables each of these to be scaled independently. This comes in handy if you have an application that is read-heavy and light on writes or vice versa.
Choreography Tradeoffs
Async programming is often a significant mindshift for developers. I tend to think of it as similar to recursion, where you can’t figure out how code will execute by just looking at it, you have to think through all of the possibilities that could be true at a particular point in time.
Complexity is shifted. Instead of having the flow control centralized in the orchestrator, the flow control is now broken up and distributed across the individual services. Each service would have its own flow logic, and this logic would identify when and how it should react based on specific data in the event stream.
I would go along the common storage idea.
Have each microservice register itself with the common storage. Have each microservice register it has processed the message identifier when it does.
You can work out which n services should process it and how many of the n service have processed it.
No services need to be aware of each other.