Master’s Thesis Presentation • Systems and Networking • Fast Kubernetes Orchestration for Dynamic and Ephemeral Workloads

Friday, May 22, 2026 1:00 pm - 2:00 pm EDT (GMT -04:00)

Please note: This master’s thesis presentation will take place in DC 1304.

Ali Abbasi Alaei, Master’s candidate
David R. Cheriton School of Computer Science

Supervisor: Professor Martin Karsten

In recent years, Kubernetes has become the primary choice for orchestrating software deployments on distributed clusters. It abstracts the complexities of the underlying infrastructure, enabling developers to focus on application logic rather than low-level resource management. In addition to traditional long-running services, Kubernetes is increasingly used to manage short-lived and latency-sensitive applications, such as serverless functions that demand fast startup and high deployment churn. The problem with using Kubernetes for these use cases is that it sacrifices performance for high reliability and strong consistency, making it less suitable for performance-critical and dynamically scaled environments.

This work identifies etcd, the central storage in Kubernetes, as a critical factor influencing performance due to its persistent and strongly consistent operation. A design change for the Kubernetes control plane is proposed and implemented that employs a complementary metadata store, called Sharded Transient Etcd (STE), alongside the persistent etcd in the Kubernetes cluster. STE stores short-lived entities, such as Pod objects, which are critical yet ephemeral resources, in fast, in-memory storage, while retaining persistent etcd for durability-critical data. This design preserves reliability and failure recovery while significantly improving orchestration performance and scalability. An experimental evaluation shows that the prototype doubles deployment throughput, reduces Pod creation latency by 80%, and improves resource efficiency of Kubernetes components by up to 20% without compromising stability or failure recovery. In addition, the results show that the cold start latency of a widely used Kubernetes-based serverless orchestrator is reduced by more than 60%. While this work is focused on Kubernetes, the resulting insight is applicable to orchestration systems in general.