I’ve recently started reading Designing Data-Intensive Application book by Martin Kleppmann and I’m really loving it! This article is based on my notes from chapter 5 - Replication and on topic single leader-based replication.

Why Replication?

Before this just to make sure we’re on same page, replication means having the same copy of data on multiple machines interconnected via network. Now, why do we need replication? To make our system highly available, scalable and to reduce latency! It reduces the latency by bringing the data geographically closer to the users. It makes system highly available in a way that even if some parts have failed, the data can be served from other replicas. And finally, replication favours scalability as there is more than one source to fetch the data from.

Replicating static data that won’t change is easy: copy the data to every node and you’re done. But if the data changes, you need to update the data in all of the nodes. There are three popular algorithms to do this

  1. Single Leader Replication
  2. Multi Leader Replication
  3. Leaderless Replication

I’ll try to write my understanding about single leader replication in this article.

Leader-based Replication

A node that stores the copy of the data is called a replica. Let’s say that a user performs a write operation on one of the replica, how do we ensure that this change is reflected in all other replicas? The most common solution is to use something called a leader-based replication (active-passive or master-slave replication).

One of the replicas is designated as a leader (also known as a master or primary node). This leader is responsible for receiving all writes and propagating those changes to all other replicas, now know as followers (also know as read replicas, slaves, secondary or hot standbys). And these followers are only responsible for serving read requests from users and only accepts write requests write requests from the master node. Whereas the leader can serve both read and write requests.

This propagation happens in the following way: whenever the leader writes something, it also sends these changes to all of its followers as a replication log or change streams. And the followers applies these changes in the same order as they receive.

Synchronous vs Asynchronous Replication

The changes can be sent to the followers synchronously or asynchronously. When using synchronous replication, the user gets acknowledged only after all the replicas were updated with the changes. But in asynchronous replication the user gets acknowledgement immediately after writing to leader and the leader is responsible for propagating those changes to all the replicas in later point of time. There are actually various advantages and disadvantages to both of these methods. Let’s see what they are:

Advantages of synchronous replication:

  1. Consistent reads: This ensures that all the followers have the most recent data. This ensures that the data read by user from any of the followers will be consistent and latest.
  2. Immediate failover readiness: Whenever a leader goes down, we can still be ensured that the latest data is available on followers.

Disadvantages of synchronous replication:

  1. Reduced write throughput: If one of the follower goes down, the leader has to wait until it’s up again before acknowledging the user about the writes.