The CAP theorem states that in a distributed data store, it is impossible to simultaneously provide more than two out of the following three guarantees:
Because network failures are a given in any distributed system, Partition Tolerance (P) is a necessity. Therefore, the theorem effectively states that in the event of a network partition, a distributed system must make a trade-off between Consistency (C) and Availability (A).
To fully understand the theorem, let's break down what each guarantee means in the context of a distributed system (like a database cluster with multiple nodes).
This guarantee means that every read operation receives the most recently written data or an error. When data is written to one node, all subsequent reads from any node in the cluster will return that new value. All clients see the same view of the data at the same time.
This guarantee means that every request receives a (non-error) response, without the guarantee that it contains the most recent write. The system is always operational and responsive. No request is ever dropped or ignored.
This guarantee means that the system continues to operate even if there is a network partition—a communication break or delay between nodes. In a distributed system, nodes communicate over a network, and networks are unreliable. Messages can be lost or delayed. A "partition" is when the cluster splits into two or more groups of nodes that cannot communicate with each other.
The CAP theorem's critical insight comes into play when a network partition occurs.
Imagine a simple two-node database cluster, Node 1 and Node 2. A network failure happens, and they can no longer communicate. A user is connected to Node 1 and updates a value (e.g., x = 10).
Now, another user connects to Node 2 and tries to read the value of x. The system is now faced with a choice:
Choose Consistency over Availability (A CP System)
To guarantee consistency, Node 2 must provide the absolute latest data. Since it cannot communicate with Node 1 to find out about the new write (x = 10), it must refuse to answer the read request.
It will return an error or time out, thus making itself unavailable.
* Result: The system is consistent but not fully available.
Choose Availability over Consistency (An AP System)
To guarantee availability, Node 2 must respond to the read request. Since it can't get the latest data from Node 1, it will respond with the last version of the data it has (e.g., the old value, x = 5).
This data is "stale," so the system is no longer consistent across all nodes.
* Result: The system is available but not consistent. When the partition heals, the system will need a strategy to reconcile the conflicting data.
This trade-off forces architects of distributed systems to make a fundamental design choice.
These systems sacrifice availability during a partition to ensure that data is never incorrect or stale.
These systems sacrifice consistency during a partition to ensure that the system always remains online and responsive. They often achieve "eventual consistency," where data is synchronized and reconciled after the network partition is resolved.
A CA system is one that is both consistent and available but cannot be partition-tolerant. This model only works for systems that run on a single node. Traditional relational databases like PostgreSQL or MySQL, when run on a single server, are CA systems. They are consistent and available, but since there is no network between distributed nodes, there is nothing to "partition."
In the context of distributed systems, CA is not a practical choice, as you cannot avoid the risk of network partitions.