Introduction
To ensure scalability, reliability and fault tolerance, web applications replicate data across multiple database nodes/clusters/availability zones/regions and cloud providers. However, replicating data introduces the challenge of keeping the copies in sync. In this article, we contemplate weather quantum entanglement can enable database replicas to sync instantinously. The ideas behind entanglement stem from quantum mechanics, a subfield of physics which describes the behaviour of sub atomic particles.
In this post, we first describe the basics of database replication, then we move on the show the theoritical application of quantum entanglement in database replication.
Basics of data replication
The figure below shows a simplified setup of client-server web applications. The user would send requests to the backend. Then this request goes through several components like load balancers, app servers, caching servers etc. These components perform several functions like request validation, communicating with database server to manipulate data, responding back the the user etc. In the diagram below, we’ve abstracted away all these details for simplification. The app servers and other components need to communicate with a database system to manipulate user data and return appropriate response.
However, there are few problems with this design. Consider what happens, if the machine running the database system collapses. The user’s data will be lost and there’s no way we can recover it back. Also, consider what would happen when there are much more users than the capacity of the database server. In this case, the queries will become slow and user’s will experience higher latency.
To solve these problems, we can replicate the data across multiple nodes. Each such node (or database server) is called a replica. In a simple scheme, we have a single master replica and multiple slave replicas. The write requests are handled by master replicate, while read requests are handled by slave replicas.
This architecture introduces the following features into our system:
- scalability: read requests can be distributed among read replicas allowing the system to handle more number of read requests compared to previous approach
- availability: if one of the replica goes down, the system will remain functional since others can handle the request
- redundancy: since we are storing the same data at multiple places, we no longer keep a single copy of user’s data i.e. we store the data redundantly
- fault tolerance: the system is more tolerant to node failures
A simple implementation of the above scheme would look like:
Suppose, we need to update a piece of data. Now we would route the write request to the master replica. The master replica will update it’s own database. However, the master replica is no longer in sync with read replicas (in that the read replicas still show an outdated version of the data). Therefore, we need to communicate the newly updated data across read replicas. This can be easily achieved by making an update request synchronously/asynchronously from master replica to slave replicas. The time it takes for data to travel from master replica to all slave replicas is called replication lag.
However, during this procedure, there is still a short duration in which read replicas hold inconsistent data. You can either block or allow incoming read requests on slave replicas depending on weather you are optimizing for consistency or availability.
If you allow the read requests during updation of slave replicas, then you allow the user to read stale data, hence your system is not consisten (in that the read requests is not returning the latest data), however, your system is available (the user is returned with some response instead of error).
On the other hand, if you block read requests in this duration, then you are basically returning the user with an error response. You system is hence less available, but more consistent (in that if a user reads data, it is guaranted that it is latest).
The tradeoff’s are based on requirements of your system and the nature of application at hand.