Join us
@kertpjatkin ・ Nov 04,2021 ・ 6 min read ・ 1194 views ・ Originally posted on medium.com
If you are struggling with microservices that communicate with each other synchronously, then event-driven architecture may seem like the promised land.
The reality is not quite so rosy, though. While event-driven architecture has its perks, it also comes with a bunch of issues and complexities.
In our experience, one of the most serious challenges is data inconsistency. This can happen if you modify a database record, but fail to dispatch an event to the event stream.
For example, here at Outfunnel we often encounter situations where a user modifies contact data in a CRM, we then successfully store the changes in the database but fail to run other vital business logic because we didn’t emit the required events.
The article takes a deep dive into the challenges of event-driven architecture and the transactional outbox pattern that gets us closer to the promised land by solving the problem of data inconsistency between the database and the event stream.
The Problem: Data Inconsistency
Imagine a scenario where the business logic is responsible for creating a user in the database and emitting a user_created
event to the event stream:
At first glance, the code may seem to work. But the reality is that the more our application receives traffic, the higher the chance that the code is being executed either when the database or the event stream is down.
What happens if the database is down? We’re not able to create the user in the database and with a very high chance an exception gets thrown. As a result, we won’t emit an event to the event stream.
What happens if the event stream is down? We’re able to successfully create the user record in the database, but we will not be able to dispatch it. That creates data inconsistency. We fail to dispatch a domain event, even though we modified the database.
Considering event-driven architecture paradigms, this kind of problem becomes very critical very quickly.
Tackling Data Inconsistency
Let’s take a look at a few ways to tackle the data inconsistency issue: event-first approaches, transaction-based approach and the transactional outbox pattern.
The first option would be to emit an event before creating the user in the database. However, it does not work in the example case. We need to have a user record present with an ID before we can emit an event.
Additionally, even if we didn’t need to create a record in the database before emitting an event, we’d still run into issues. For example, what happens if the database is down? In such a case, we emit an event but the database operation fails. As we can’t roll back the emitted event, then this approach would cause a similar problem as the solution presented above.
A second option would be to follow the CQRS-like pattern. Instead of creating the user record and emitting an event, the business logic would just emit an event. If we want to create a record in the database, then the same service has to listen to the event stream and create or modify records in the database as an action to an event coming from the stream.
Yet we’d run into similar difficulties: we need to have the record ID before emitting an event. Further, the solution breaks the read-your-writes consistency, as any kind of database update goes through additional asynchronous processing. If the asynchronous processing is delayed, then we can’t guarantee that the database operation is complete early enough for the client to see the outcome of the operation.
It’s worth noting that none of those event-first alternatives work when the database operation breaks the database invariants. In such a case, we emit an event even though the database operation was not successful due to a failed constraint.
The more we look into the issue, the more it appears that we need to have a transaction mechanism in place. In that case, both modifying the database and emitting an event succeed or fail together.
One option would be to use distributed transactions. However, the most popular event streaming solutions (e.g. Kafka) do not support 2-phase commits and therefore the solution is off the table.
We could consider using the database transaction mechanism for solving the problem.
As both the data record and the event are stored in the same database, but different tables, then we can use the transaction mechanism provided by the database.
If there are any issues with creating the new user, then the event is not stored in the outbox collection. And at the same time, if we can’t emit an event, then the insertion operation will be also rolled back. This should ensure that we prevent any kind of data inconsistencies.
In order to read the events from the outbox and actually emit them to the event stream, we need to run a separate worker that reads any new or undispatched events and sends them to the event stream. Once successfully dispatched, we can delete the entries from the outbox table.
Serialization
As we dig ourselves deeper into the world of event-driven architecture we usually end up seeing a lot of issues with the event schema validation. That’s also something we need to consider when using the outbox pattern.
In general, there are two approaches: either serializing the event to the correct format right before emitting the event in the dedicated workers or before creating the outbox record.
Let’s take a closer look both options.
Imagine that the event payload needs to include the full name of the user and the subscription ID. We can retrieve the latter from the subscriptions table.
In case we decide to serialize the event payload on emitting the event, then the application code can stay as it is:
But what happens if the subscription record is deleted from the DB right before emitting the event due to a bug? Considering that the downstream consumers expect the subscription ID to be present, then we either need to fix the issue manually, or we’re not able to dispatch the event at all. As the events are emitted asynchronously, then we’re able to find out the issue only after we’ve already executed the business logic.
However, if we decide to serialize the payload before inserting a record to the outbox table, then we need to modify the business logic.
In such a case, the application logic becomes more complex. In addition, fetching the data and encoding the data into a binary format incurs a performance penalty. However, choosing this approach provides the strongest guarantee as the correct event payload is as important as data consistency between the database and event stream.
Cons of the Pattern
As with any other solution, the transactional outbox pattern isn’t a silver bullet. Let’s take a look at some of its potential downsides.
The most apparent disadvantage of the pattern is that we need a dedicated worker that emits the events. Hence, we need to have proper monitoring in place to ensure that the worker doesn’t die or fall too much behind.
Additionally, there may be cases where the solution emits an event more than once — the pattern follows at-most-once semantics. For example, if the worker is able to successfully dispatch an event, but crashes right before it’s about to delete the record, then upon restarting the worker, the same event gets processed the second time.
If it’s critical to ensure exactly-once semantics, we need to build deduplication logic into all the downstream consumers.
The pattern also puts an additional burden on the database. Therefore, we should consider the potential performance hit on the database and if we’re okay with some inconsistencies.
Conclusion
Data inconsistencies between the database and the event stream are a serious challenge in an event-based world. While there are many potential solutions that can help prevent data inconsistencies, the transactional outbox pattern stands out as the most reliable option.
The transactional outbox pattern approach does come with its own downsides and complexities, but as long as you’re aware of them and implement ways to prevent any serious issues — you’ll have a reliable solution. After implementing the changes in the most critical parts of the codebase, we reduced the amount of any kind of data inconsistencies significantly.
Did you enjoy this read? It’s a prime example of the challenges and solutions we at Outfunnel work on. P.S. We’re hiring!
Join other developers and claim your FAUN account now!
Influence
Total Hits
Posts
Only registered users can post comments. Please, login or signup.