Introduction

To ensure scalability, reliability and fault tolerance, web applications replicate data across multiple database nodes/clusters/availability zones/regions and cloud providers. However, replicating data introduces the challenge of keeping the copies in sync. In this article, we contemplate weather quantum entanglement can enable database replicas to sync instantinously. The ideas behind entanglement stem from quantum mechanics, a subfield of physics which describes the behaviour of sub atomic particles.

In this post, we first describe the basics of database replication, then we move on the show the theoritical application of quantum entanglement in database replication.

Basics of data replication

The figure below shows a simplified setup of client-server web applications. The user would send requests to the backend. Then this request goes through several components like load balancers, app servers, caching servers etc. These components perform several functions like request validation, communicating with database server to manipulate data, responding back the the user etc. In the diagram below, we’ve abstracted away all these details for simplification. The app servers and other components need to communicate with a database system to manipulate user data and return appropriate response.

However, there are few problems with this design. Consider what happens, if the machine running the database system collapses. The user’s data will be lost and there’s no way we can recover it back. Also, consider what would happen when there are much more users than the capacity of the database server. In this case, the queries will become slow and user’s will experience higher latency.

To solve these problems, we can replicate the data across multiple nodes. Each such node (or database server) is called a replica. In a simple scheme, we have a single master replica and multiple slave replicas. The write requests are handled by master replicate, while read requests are handled by slave replicas.

This architecture introduces the following features into our system:

  1. scalability: read requests can be distributed among read replicas allowing the system to handle more number of read requests compared to previous approach
  2. availability: if one of the replica goes down, the system will remain functional since others can handle the request
  3. redundancy: since we are storing the same data at multiple places, we no longer keep a single copy of user’s data i.e. we store the data redundantly
  4. fault tolerance: the system is more tolerant to node failures

A simple implementation of the above scheme would look like:

Suppose, we need to update a piece of data. Now we would route the write request to the master replica. The master replica will update it’s own database. However, the master replica is no longer in sync with read replicas (in that the read replicas still show an outdated version of the data). Therefore, we need to communicate the newly updated data across read replicas. This can be easily achieved by making an update request synchronously/asynchronously from master replica to slave replicas. The time it takes for data to travel from master replica to all slave replicas is called replication lag.

However, during this procedure, there is still a short duration in which read replicas hold inconsistent data. You can either block or allow incoming read requests on slave replicas depending on weather you are optimizing for consistency or availability.

If you allow the read requests during updation of slave replicas, then you allow the user to read stale data, hence your system is not consisten (in that the read requests is not returning the latest data), however, your system is available (the user is returned with some response instead of error).

On the other hand, if you block read requests in this duration, then you are basically returning the user with an error response. You system is hence less available, but more consistent (in that if a user reads data, it is guaranted that it is latest).

The tradeoff’s are based on requirements of your system and the nature of application at hand.

Current approaches of database replication rely on communication between replicas. The inconsistencies are resolved using distributed consensus mechanisms such as Paxos or Raft.

Conflict-free replicated data type are used in some schemes. Using CRDT’s data is replicated across multiple data centres. Each data center operates on its own data which is non overlapping with any other data centres. This way, updates can be made at multiple locations, since their merge is guaranteed to cause no conflicts (because the data updated was independent).

Quantum entanglement

Quantum physics is the subfield of physics that describes the behaviour of subatomic particles. This field was introduced, after it was discovered that classical mechanics cannot be used to describe the behaviour of subatomic particles.

Subatomic particles like electrons and photons exhibit what is called wave-particle duality. It means that they act as a wave as well as particles depending on how it is observed or measured.

Acording to quantum mechanics, the complete state of a subatomic particle like electron, photon etc. can be described by its wave function. It encodes information such as its position, momentum, spin, etc. For a single particle, this wavefunction can be spread out in space, indicating a superposition of many possible states. This is known as superposition: the particle doesn’t have a definite state until measured.

Quantum entanglement is a phenomena in quantum mechanics, where two or more particles become correlated in such a way that the quantum state of one particle cannot be described independently of the quantum state of the other(s), no matter how far apart they are.

This means that if 2 particles are entangled, and we measure the state of one particle. Then the state of the second particle is guaranteed to be same, irrespective of the distance between these 2 particles.

This has serious implications in physics, since it appears to violate the principle of locality which states that information cannot travel faster than light. However, this is not the case, since there is no tranfer of information. It is not as if, when the first particle was measured, it transmitted its state information across to its entangled counter part instantinously so it can take adjusts its state. Instead, the 2 particles have corellated wave function. Measurement of one’s state instaniously determines the state of the other.

Quantum entanglement is known to be true and has been proved experimentally by bell in 1960s. Now if we consider this fact at face value, we can build some really cool applications out of it.

Data syncing using Quantum entanglement

In computers, information is represented in bits. A bit can hold 2 values, either 0 or 1. Computers are agnostic of the underlying mechanism on which bits are stored. For example, modern computers use Metal Oxide Field Effect Transistors (MOSFETS) for representing information. These transistors, act like switches where absence or presents of electricity determins the value of the bit.

An electron’s spin can take 2 values “up” or “down”. The value of an electron’s spin determines its angular momentum.

Assuming that we have the technology to use an electron’s spin as a bit of computer memory. Consider the following thought experiment:

We have 2 electrons in quantum entanglement. These 2 electrons act as computer memory. Suppose, these 2 electrons represent the same variable.

1
bool x = 1;

Suppose, The variable x is stored as “spin up” state in electron1 and electron2. Now suppose, we seperate these electrons to 2 locations far apart. Suppose, electron1 is in India and electron2 is in Russia. Now we update the variable x as follows:

1
x = 0;

Please note that the update is only made at electron1 located in India, however, by virtue of the fact that the 2 particles are quantum entangled, the second particle will be in a flipped state upon measurement automatically. Hence, we have updated the variable to x=0 at both locations instantinously eventhough we never made an update in second location.

Hence, we synced data copies across 2 locations instantinously.

Conclusion

Through the thought experiment above, we’ve shown how quantum entanglement can be used to sync data instantinously. Most of the technology required to implement such a system is already present. For example, quantum computers can manipulate qubits on will using magnetic fields. Also, quantum computing technology has enabled us to use sub atomic particles as pieces of information.

This kind of setup achives replication lag of 0. This has real implications for systems that require strong consistency and high availability. Interestingly, it also violates the CAP theorm.