I don't know why, but I've always been interested in storage and distributed systems, particularly systems that are the combination of both, such as Ceph. (Who's to say why people like the things they like?)
I read a lot about distributed storage systems, and in doing so, I often come across the somewhat mysterious CAP theorem (also known as Brewer's theorem) ' an axiom that I believe deserves to be considered when selecting storage systems, especially when distributed systems are a potential solution.
The CAP theorem, as far as I can decipher, addresses the common desire for distributed systems that offer consistency, availability, and tolerance of network failure (otherwise known as partitioning, the 'P' in CAP).
The theorem argues that you cannot have all three at the same time, and sometimes you can't even get two of the three.
This is not to say that there is a problem with using distributed computing technologies in general, rather, the designers of each system has to choose what parts of the CAP theorem to focus on, and those decisions impact how the system operates in production ' and by that I mean how the system acts when a component, such as the network, fails.
CAP theorem background
The CAP theorem was first presented in 2000 by Dr. Eric Brewer. Interest in CAP, at least in terms of blog posts, came to a head in 2010, and then seemed to taper off. However, this conversation has recently been reignited by a series of fascinating blog posts on network partitions by Kyle Kingsbury. I heavily suggest reading Kyle and Peter Bailis' recent detailed post, The Network is Reliable, which brings the issue of network partitioning, as well as client-side failures, to the forefront.
Here are a few links that I have been reading to learn more about distributed systems and CAP theorem, at least from my (layman) perspective:
- Dr. Eric Brewer's original presentation
- Seth Gilbert and Profession Nancy Lynch's paper formalizing and proving CAP
- FAQ on CAP theory
- Mike Stonebraker ' Clarifications on the CAP Theorem and Data-Related Errors
- James Hamilton ' Stonebraker on CAP Theorem and Databases
- Daniel Abadi ' Problems with CAP, and Yahoo's little known NoSQL system
- Kyle Kingsbury's Jespen series
- Peter Deutch's The Eight Fallacies of Distributed Computing and Arnon Rotem-Gal-Oz's Fallacies of Distributed Computing Explained
- Kyle Kingsbury ' Jespen: Testing the Partition Tolerance of PostgreSQL, Redis, MongoDB and Riak
- Werner Vogel ' Eventual Consistency Revisited
- Eric Windisch and Randy Bias ' Scale-out Block Storage: There's a Reason AWS Elastic Block Storage Looks the Way it Does (slides)
Things I've learned
Rather than get into a long blog post trying (and failing) to summarize concepts that others have already skillfully done, I'd like to mention what I've learned while reading about the CAP theorem:
- Given the CAP theorem's central argument, and the fact that there is no 100% reliable network, distributed systems that provide both consistency and availability are impossible.
- Because consistency is most important in block and file system storage, availability may not be possible with distributed storage. This means that in some cases, choosing a simpler storage system for infrastructure as a service, like what CloudScaling has done, may make more sense than a distributed storage system.
- Eventual consistency is often used in object storage and NoSQL systems in exchange for availability.
- Clients are an important part of a distributed system (it's easy to forget that). An example of this ' discussed in Kingsbury's post ' is when clients indicate a failed write, even when it succeeded.
- It's difficult to determine the difference between a failed host, a failed link, and delayed packets.
In summary, when choosing distributed systems, at the very least it's important to decide whether consistency or availability is most important, and base your selection on that criteria. Going further, if you consider how the distributed system acts when components fail, this should also help to reduce downtime.