The reality is not quite so rosy, though. While event-driven architecture has its perks, it also comes with a bunch of issues and complexities.
In our experience, one of the most serious challenges is data inconsistency. This can happen if you modify a database record, but fail to dispatch an event to the event stream.
For example, here at Outfunnel we often encounter situations where a user modifies contact data in a CRM, we then successfully store the changes in the database but fail to run other vital business logic because we didn’t emit the required events.
The article takes a deep dive into the challenges of event-driven architecture and the transactional outbox pattern that gets us closer to the promised land by solving the problem of data inconsistency between the database and the event stream.
The Problem: Data Inconsistency
Imagine a scenario where the business logic is responsible for creating a user in the database and emitting a user_created
event to the event stream:
At first glance, the code may seem to work. But the reality is that the more our application receives traffic, the higher the chance that the code is being executed either when the database or the event stream is down.
What happens if the database is down? We’re not able to create the user in the database and with a very high chance an exception gets thrown. As a result, we won’t emit an event to the event stream.
What happens if the event stream is down? We’re able to successfully create the user record in the database, but we will not be able to dispatch it. That creates data inconsistency. We fail to dispatch a domain event, even though we modified the database.
Considering event-driven architecture paradigms, this kind of problem becomes very critical very quickly.
Tackling Data Inconsistency
Let’s take a look at a few ways to tackle the data inconsistency issue: event-first approaches, transaction-based approach and the transactional outbox pattern.
Event-First Approaches
The first option would be to emit an event before creating the user in the database. However, it does not work in the example case. We need to have a user record present with an ID before we can emit an event.
Additionally, even if we didn’t need to create a record in the database before emitting an event, we’d still run into issues. For example, what happens if the database is down? In such a case, we emit an event but the database operation fails. As we can’t roll back the emitted event, then this approach would cause a similar problem as the solution presented above.
A second option would be to follow the CQRS-like pattern. Instead of creating the user record and emitting an event, the business logic would just emit an event. If we want to create a record in the database, then the same service has to listen to the event stream and create or modify records in the database as an action to an event coming from the stream.
Yet we’d run into similar difficulties: we need to have the record ID before emitting an event. Further, the solution breaks the read-your-writes consistency, as any kind of database update goes through additional asynchronous processing. If the asynchronous processing is delayed, then we can’t guarantee that the database operation is complete early enough for the client to see the outcome of the operation.
It’s worth noting that none of those event-first alternatives work when the database operation breaks the database invariants. In such a case, we emit an event even though the database operation was not successful due to a failed constraint.
Transaction-Based Approach
The more we look into the issue, the more it appears that we need to have a transaction mechanism in place. In that case, both modifying the database and emitting an event succeed or fail together.
One option would be to use distributed transactions. However, the most popular event streaming solutions (e.g. Kafka) do not support 2-phase commits and therefore the solution is off the table.
We could consider using the database transaction mechanism for solving the problem.