Candidate: Hua Fan
Title: Building Scalable and Consistent Distributed Databases under Conflicts
Date: March 14, 2018
Time: 1:30 PM
Place: EIT 3142
Supervisor(s): Golab, Wojciech
Distributed databases, which relay on redundant and distributed storage across multiple servers, are able to provide mission-critical data management services at large scale. Parallelism is the key to the scalability of distributed databases, but concurrent queries having conflicts may block or abort each other when strong consistency is enforced using rigorous concurrency control protocols. This work studies the techniques of building scalable distributed databases under strong consistency guarantees even in the face of high contention workloads. The techniques proposed in this work share a common idea, conflict avoidance, meaning proactively avoiding potential conflicts in the concurrency control in the first place instead of resolving contending conflicts. Using this idea, concurrent queries under conflicts can be executed with high parallelism. This work explores this idea on both databases that support serializable ACID (atomic, consistency, isolation, durability) transactions, and eventually consistent NoSQL systems.
First, the epoch-based concurrency control (ECC) technique is proposed in ALOHA-KV, a new distributed key-value store that supports high performance read-only and write- only distributed transactions. ECC demonstrates that concurrent serializable distributed transactions can be processed in parallel with low overhead even under high contention. With ECC, a new atomic commitment protocol is developed that only requires amortized one round trip for a distributed write-only transaction to commit in the absence of failures. Thus, ALOHA-KV can execute distributed transactions efficiently under conflicts.
Second, a novel paradigm of serializable distributed transaction processing is developed to extend ECC with read-write transaction processing support. This paradigm uses a newly proposed database operator, functors, which is a placeholder for the value of a key, which can be computed asynchronously in parallel with other functor computations of the same or other transactions. Functor-enabled ECC achieves finer-grain concurrency control than transaction level concurrency control, while ECC already guarantees transaction atomicity and resolves transaction orders. This combination of techniques never aborts transactions due to read-write or write-write conflicts but allows transactions to fail due to logic errors or constraint violations while guaranteeing serializability. ALOHA-DB, a scalable distributed transaction processing system, is implemented atop ALOHA-KV to support functor-enabled ECC.
Lastly, this work explores consistency in the eventually consistent system, Apache Cassandra, for an investigation of the consistency violation anomalies, referred to as “consistency spikes”. A read-write conflict, if which is not resolved appropriately, may result in a consistency violation. This investigation shows that the consistency spikes exhibited by Cassandra are strongly correlated with garbage collection, particularly the “stop-the-world” phase in the Java virtual machine. Avoiding potential read-write conflicts by delaying read operations artificially at servers immediately after a garbage collection pause, can virtually eliminate these spikes.
All together, these techniques allow distributed databases provide scalable and consistent storage service.