Q&A with Professor Xiao Hu, who studies fundamental problems in database theory

Tuesday, October 29, 2024

Xiao Hu joined the Cheriton School of Computer Science as an Assistant Professor in July 2024. She is broadly interested in database theory and its applications to practical database systems, such as massively parallel query processing, dynamic query processing, oblivious query processing, and machine learning in query optimization.  Her research awards include the PODS 2024 Distinguished Paper Award, the SIGMOD 2024 Best Paper Award Honorable Mention, and the ICDT 2024 Best Paper Award.

Before joining the Cheriton School of Computer Science, she was a Research Fellow at the Simons Institute for the Theory of Computing, a Visiting Faculty Scholar in the Discrete Algorithm Group at Google Research, and a Postdoctoral Associate in the Department of Computer Science at Duke University. She received her PhD from HKUST in 2019 and BE from Tsinghua University in 2014.

What follows is a lightly edited transcript of a conversation with Professor Hu, where she discusses her research, advice for aspiring computer scientists, and what excites her about joining the Cheriton School of Computer Science.

Professor Xiao Hu on a bench in Waterloo's rock garden

What challenges in database theory do you find most exciting to tackle?

First, let me start by explaining what database theory is. Database theory is a crucial bridge between theoretical computer science and practical data management. It focuses on the principles, algorithms and models used to design, optimize, and query databases.

A key area within this field is query processing, which provides the mathematical and algorithmic foundations for executing database queries efficiently while addressing challenges posed by emerging data characteristics (such as volume, privacy, velocity, and noise) and evolving computing architectures (such as cloud computing, distributed systems, and specialized hardware like GPUs and TPUs).

As data becomes increasingly ubiquitous and complex, query processing has expanded far beyond traditional applications, influencing a wide array of disciplines within computer science, including data analytics, machine learning, knowledge graphs, mathematical optimization, operations research, bioinformatics, and human-computer interaction.

Each of these fields relies on cutting-edge research in query processing to enhance performance. Thus, understanding the capabilities, limitations, and applicability of various query-processing techniques is one of the most pressing issues in database research today.

Tell us a bit about your research.

My research interests are in both theoretical and practical aspects of data management. A unifying theme of my research has been developing simple algorithms with optimal performance guarantees that also work well in practice. I work extensively on query evaluation, a core problem in database research, which studies how to efficiently compute answers to questions specified in high-level, declarative language. Finding the optimal way to evaluate queries has been a long-standing focus since different evaluation strategies could have vastly different efficiency, even on the same database.

In the last twenty years, the need to process and analyze big data has invigorated this long-running research area with new challenges. Let me give three illustrative examples.

First, data has grown tremendously in size and complexity, such that a single machine cannot store and process such a large volume of data, resulting in massively parallel systems for data processing. How do we design algorithms for query processing that can scale to thousands of machines in parallel while keeping the communication overhead low?

Second, data is generated rapidly and can be inserted or deleted dynamically. How do we design data structures that can be adapted efficiently while the query answers can be delivered to users with low latency whenever needed?

Third, data is typically integrated from multiple sources and contains sensitive information, which has given rise to increasing privacy concerns. The query answers can leak sensitive information about the underlying data. Evaluating a query, even on encrypted data, can also leak information. How do we design algorithms for query processing whose behaviour is oblivious to the data and whose output can guarantee privacy?

Hence, my research aims to equip modern data systems with efficient query processing algorithms while offering different guarantees — such as scalability, low latency, and privacy — in emerging data applications and computing architectures.

What do you see as your most significant contribution?

I think my research has breathed new life into this decades-old research area. We have a very exciting result that solved a long-standing open problem: how efficiently can a join-aggregate query be processed in the conventional RAM model?

Join-aggregate queries defined over commutative semirings subsume a wide variety of common algorithmic problems, such as graph pattern matching, graph colourability, matrix multiplication, and constraint satisfaction problems. Arguably, the most celebrated result of classic database query evaluation is the famous Yannakakis algorithm going back to 1981. Despite the popularity of this textbook algorithm, no improvement in its complexity has been discovered in the last 40 years, and it was assumed to be optimal.

But in our very recent paper, which is under revision for presentation at PODS 2025, we have offered the first improvement over this algorithm and proved that this algorithm is output-optimal among all combinatorial algorithms. Beyond the combinatorial algorithms, we recently explored algebraic techniques such as fast matrix multiplication, to further speed up processing general conjunctive queries in a paper that was presented at PODS 2024. Those results are very exciting because they hold the potential to further improve the performance of practical database systems, especially if we can combine them with the latest advancements in computer architecture, such as graphic processing units and tensor processing units.
What advice would you give to students interested in pursuing research in your area?

Broadly speaking, curiosity is the most important characteristic a student needs. With inquisitiveness, a student will want to further investigate and explore ideas, think about questions to pursue in experiments, or develop in theory.

From a technical perspective, to prepare for research in my area it’s helpful to have taken courses in mathematics, algorithms, and data structures, which help provide the foundation to design and analyze algorithms. My research combines theory with practice, so students with experience in coding and implementation bring valuable skills. Let’s see you put your theory into practice.

Do you see opportunities for collaborative research at the Cheriton School of Computer Science?

There are plenty of collaboration opportunities here at Waterloo, especially given the strength of our Data Systems Group (DSG), which boasts a diverse set of expertise. I look forward to working with my DSG colleagues on topics such as graph processing, Datalog, differential privacy, and oblivious query processing. We also plan to co-supervise some students, which will provide valuable learning opportunities for them and foster deeper collaboration within the group.

Beyond DSG, I’m connecting with researchers in the Algorithms & Complexity Group to explore more theoretical problems in query processing, as well as with members of the Natural Language Processing group to investigate query processing over structured and unstructured data.
What aspect of joining the Cheriton School of Computer Science excites you the most?

I want to emphasize two key points. First, we have a vibrant research community here with outstanding researchers across diverse fields. It’s inspiring to be in an environment where you can both learn from and collaborate with experts. Second, Waterloo attracts exceptional students at both the undergraduate and graduate levels.

In my group, I currently supervise a PhD student, a master’s student, and an undergraduate student, all of whom are talented, curious, and dedicated to their work. For instance, my undergraduate student is particularly enthusiastic about implementing fast matrix multiplication for query processing. I look forward to expanding my research team with more students as passionate as my current team.

Who has inspired you most in your career?

Professor Ke Yi, my PhD advisor at the Hong Kong University of Science and Technology, has been an inspiring mentor whose talent, intellect, and passion for research have profoundly influenced me. His relentless work ethic, paired with a positive outlook on life and a commitment to maintaining a healthy work-life balance, serves as a model that I continue to aspire to emulate.

Especially in the last two years, Professor Yi provided me with invaluable encouragement. At a time when I was uncertain about my career path and considering options beyond academia, his unwavering support helped me see the value and potential of an academic research career. I’m incredibly grateful to have had such a dedicated mentor, not only in research but in life as well. Professor Yi is also known for his kindness and openness towards his students, creating an environment where everyone feels welcome and valued.

My goal as an advisor is to offer my students the same guidance, support, and encouragement that he gave me. He exemplifies how a mentor’s genuine care can transform the lives of young researchers, and I hope to carry forward his legacy of mentorship in my career.

What do you do in your spare time?

I enjoy exercising, especially yoga and Pilates. Yoga, in particular, provides a space for both relaxation and mental clarity; I often find myself thinking through research problems during my practice. The meditative aspect helps me clear my mind while also sharpening my focus.

I also love hiking and spending time outdoors. Now that I’m in Canada, I’d like to explore some new activities — skiing, for instance. I’ve never tried it before, but it seems like a fun challenge, and I’m excited to experience more of what the Canadian winter has to offer.

  1. 2024 (5)
    1. October (1)
    2. September (1)
    3. August (1)
    4. June (1)
    5. March (1)
  2. 2023 (4)
    1. November (1)
    2. May (1)
    3. January (2)
  3. 2022 (4)
    1. December (1)
    2. November (1)
    3. October (1)
    4. June (1)
  4. 2021 (5)
  5. 2020 (9)
  6. 2019 (3)
  7. 2018 (16)
  8. 2017 (11)
  9. 2016 (2)