I have a case where I have Information
objects that contain Element
objects. If I store an Information
object it will try to find preexisting Element
objects based on a unique value field otherwise insert them. Information
objects and Element
objects can't be deleted for now. Adding a parent needs two preexisting Element
objects. I was planning to use three topics: CreateElement
, CreateInformation
, AddParentOfElement
for the events Created Element Event
, Created Information Event
and Added Parent Event
. I realized since there are no order guarantees between topics and between topic-partitions that those events as shown in the picture could be consumed in different order so the schema won't be able to be persisted to an RDBMS for example. I assume that ids are used for partition assignment of the Topics as usual.
Here is my diagram:
The scenario is
Element
with (id=1) was created by userInformation
with (id=1) containingElements
(1,2,3) was created by userElement
with (id=5) was created by user- Parent of
Element
with (id=5) was set to beElement
with (id=3) by the user Information
with (id=2) containingElements
(1,3 and 5) was created by the user
I am curious if my topic selections are making sense and I would appreciate any suggestions on how to have events that when are processed by consumer database services are idempotent - don't put the system in the wrong state.
Thanks!
After considering this solution: How to implement a microservice Event Driven architecture with Spring Cloud Stream Kafka and Database per service but not being satisfied with the suggestions. I investigated Confluent Bottled Water (https://www.confluent.io/blog/bottled-water-real-time-integration-of-postgresql-and-kafka/) and later the more active but similar Debezium (http://debezium.io/)
Ι decided to follow the Debezium way. Debezium is a plugin that reads directly from Mysql/Postgres binlog and publishes those changes (schema and data) in Kafka.
The example setup I am using involves docker and here it is how I set it up for Docker Toolbox (Windows) and Docker (Linux).
Debezium creates kafka topics one for each table - by navigating to the landoop/kafka-topics-ui server on port 8000 you can have a look on how the schema of the message payloads look like below. The important part is the
payload
before
andafter
that sends the old values and the new values of the corresponding database row. Alsoop
is 'c' for create 'u' for update etc.Each consuming Microservice is using spring-cloud kafka binders using those maven dependencies:
Then I have in each of my consuming Spring Cloud Microservices a Listener that listens to all of the topics that it's interested in at once and delegates each topic event to a dedicated event handler:
In my case I wanted to be updating a graph based on the changes that happen on the RDBMS side. Of course the graph database will be eventually consistent with RDBMS. My concern was that since the topics include changes e.g in join_tables as well as the joined table sides, I wouldn't be able to create the corresponding edges and vertices without knowing that each of the vertices of the edges exist. So I decided to ask debezium gitter (https://gitter.im/debezium/dev):
From the discussion below two ways exist..Either create edges and vertices using placeholders for topics that haven't been consumed yet or use Kafka Streams to seam topics back to their original structures something that seems more painful to me than the first way. So I decided to go with the first way :)
Hopefully this answer/guide will help others jump start event sourcing having as a central piece a message broker like Kafka.