Asynchronous request reply

In a Micro Services architecture, you need to decide how microservices will communicate between each-other.

When I say "communicate", I mean a Microservice A sending a command to a Microservice B, and then Microservice B sending the response to Microservice A.

This can be done synchronously (with REST API calls) or asynchronously (using a messages bus).

Where the first possibility is quite obvious, the second one needs some clarifications.

This second possibility is called the "Asynchronous request/reply" Architecture pattern.

Here is drawing describing this pattern:

Drawing explanations:

A Microservice A (on the left) publishes a command message on the messages bus. This command message's header contains :
- a correlation id
- a reply topic
- a reply partition
A Microservice B (on the right) consumes the command message and publish in return the response message on the topic and partition specified into the command message's header.
The Microservice B adds the correlation id into the response message's header, so the Microservice A is able to match the response for the request.
Microservice A consumes messages from the message broker topic and partition specified into the message command (see step 1).
When Microservice A consumes the message having the right correlation id, it gets its response.

The drawing also shows how a scaled Microservice A can handle different "request/reply" when running on several pods. This issue is addressed using different partitions on the message broker, each pod being bound to a specific partition.

Benefits:

non-blocking call
client/service decoupling: commands are sent on a topic instead of a service's endpoint
messages buffering: commands and responses are consume at each service pace - they on the bus waiting to be consumed
fault-tolerance: as commands and responses remain on the bus until they are consumed, the service can be down. commands and responses will be consumed as soon as the service is back

Drawbacks:

topics multiplication: you need 2 topic per resource (topic for request and topic for reply)
code complexity: the code become much more complex than a synchronous endpoint request

Note:

This pattern is suitable for applications with huge concurrent requests.