Software Engineering: A Modern Approach

Marco Tulio Valente

1 Sagas: An Alternative for Data Consistency in Microservices

1.1 Introduction

As we discussed in Chapter 7, in microservice-based architectures, it is recommended that each microservice has its own database. In other words, an architecture like the following is recommended.

Microservices must have their own database

However, when following this recommendation, a common problem often arises: How can we ensure consistency when data is distributed across multiple microservices?

To illustrate, let’s use the example of an online store. In this application, to complete a sale, we need to perform two operations:

updateInventory(): update the purchased items (in the inventory microservice).
processPayment(): process the payment (in the payment microservice).

However, these two operations must constitute a transaction, in order to ensure that they are executed atomically. More specifically, atomicity means that there are only two possible outcomes:

Both operations execute successfully.
Neither operation is executed.

In our context, execute means having the effects of the operation recorded in the database. Therefore, one operation must not executed without the other, as this would leave the system in an inconsistent state.

1.2 Ensuring Atomicity

Next, we will discuss traditional methods for ensuring atomicity. First, we will cover the use of centralized databases to guarantee atomicity, followed by a presentation of protocols to ensure atomicity in distributed databases scenarios.

1.2.1 Atomicity with Centralized Databases

In a system with a monolithic architecture, there is typically a single database. In these cases, the database implementation itself guarantee the atomic execution of transactions through commit and rollback commands.

The following code illustrates this scenario:

try {
  updateInventory();
  processPayment();
  commit();
} 
catch (Failure) {
  rollback();
}

If both updateInventory() and processPayment() successfully complete their executions, we call commit to persist the results in the database. Conversely, if one of the operations fails, an exception is raised, the catch block is executed, and we call rollback to reset the database to its state before the try block.

1.2.2 Distributed Databases

However, if updateInventory() executes in one database and processPayment() executes in another, as typically happens in microservices-based architectures, the guarantee of atomicity cannot be solely delegated to the local databases.

In such scenarios, one possible solution is to use a protocol that ensures atomicity in distributed databases. The most well-known one is the Two-Phase Commit (2PC) protocol. To clarify, this protocol is typically implemented by the distributed databases, i.e., you do not need to implement it.

However, the problems with 2PC are well known. For example, the protocol has high costs and latency because the participating processes must exchange multiple messages to reach a consensus on its outcome. In the worst-case scenario, a deadlock can occur, where the transaction may become blocked indefinitely if the coordinator process fails.

For this reason, some authors explicitly recommend against using 2PC in microservices. For example, Sam Newman advises against it (link, Chapter 6):

I strongly suggest you avoid the use of distributed transactions like the two-phase commit to coordinate changes in state across your microservices.

Therefore, alternatives for ensuring the consistency of distributed data have been proposed. One such alternative is the concept of sagas, which we will describe next.

1.3 Sagas

Sagas is a database concept introduced in 1987 by Hector Garcia-Molina and Kenneth Salem. If interested, please refer to the original article, which is clear and very easy to read.

Originally, the concept was proposed to manage long-lived transactions. However, it has more recently been adapted for use in microservices-based architectures.

A saga is defined by two sets:

A set of transactions T1, T2,…, Tn (which must be executed in this order).
A set of compensations for each transaction, C1, C2,…, Cn. In other words, each transaction has a corresponding compensating transaction that reverses its effects. For example, a credit transaction of x dollars is compensated by a debit transaction of the same amount.

Ideally, all transactions Ti should be executed successfully and sequentially, starting at T1 and ending at Tn. This is the happy path for a saga.

However, when we have multiple databases (such as microservices), a transaction Tj might fail, as in this example:

T1 (success), T2 (success),…, Tj (failure)

When this happens, we need to execute the compensations of the transactions that were successfully executed:

Cj-1,…, C2, C1.

We are assuming here that when Tj fails, it does not record its effects in the database. Therefore, we only need to call the compensations from Cj-1 to C1.

To conclude, let’s show the code that implements a saga composed of three transactions:

try {
  T1();
  T2();
  T3();
}
catch (FailureT1) {
   // no compensation
}
catch (FailureT2) {
   C1();
}
catch (FailureT3) {
   C2();
   C1();
}

Exercises

1. Why shouldn’t microservices share a single database? To answer, you can consult Section 7.4.1 of Chapter 7 and the beginning of Section 7.4.

2. What’s the difference between a distributed transaction and a saga? More specifically:

Are the transactions of a saga atomic when considered individually?
Without compensations, would the transactions of a saga be atomic when considered as a single logic operation?
Suppose a transaction Ti of a saga. Can a transaction that does not belong to the saga observe the results of Ti before the saga finishes?
Suppose a distributed transaction T1. Can a transaction T2 observe T1’s intermediate results?
With sagas, we need to implement the rollback logic, i.e., we need to write the code for compensations. Is the same true for distributed transactions? Yes or no? Justify.

3. How should a developer proceed when a given compensation fails, i.e., cannot be successfully executed?

4. What problem with long-lived transactions is solved using sagas? If necessary, refer to the second paragraph of the Introduction in the article that defined the concept.

Check out the other articles on our site.