The Data Systems Seminar Series provides a forum for exploring and discussing important topics in data systems, from current challenges to emerging trends. It complements our internal meetings by bringing fresh perspectives from invited external speakers.
The schedule for the 2025–26 academic year is outlined below and will be updated as speakers are confirmed.
Seminars are typically held monthly on Mondays at 10:30 a.m. in DC 1302, unless otherwise noted. Some sessions may be held virtually on Zoom; these will be clearly marked.
All talks are open to the public.
We will record and upload videos of presentations whenever possible. Past DSG Seminar Series videos are on the DSG YouTube channel.
The Data Systems Seminar Series is supported by

| Hazar Hamouch |
| Zhuoyue Zhao |
Monday, September 22, 2025 at 10:30 a.m.
| Title | Beyond Accuracy: Data Quality as the Backbone of Trustworthy AI (PowerPoint presentation —11.8 MB PDF) |
| Speaker | Hazar Harmouch, Assistant Professor, Intelligent Data Engineering Lab, University of Amsterdam |
| Abstract | In the era of artificial intelligence, the quality of data has become a central determinant of system reliability, fairness, and trust. While advances in AI promise transformative applications across domains, the benefits are critically dependent on the quality of the underlying data. This talk explores how data quality shapes AI systems from multiple perspectives: performance, fairness, and compliance. I will discuss our work on assessing and improving data quality in machine learning pipelines, including step-by-step cleaning recommendations, quantifying diversity in datasets, and benchmarks for AI robustness under labeling noise and other data quality issues. I will also highlight our contributions to bridging the gap between technical quality assessment and human perspectives, including the development of a cross-disciplinary data quality glossary and surveying practitioners in light of the AI Act. Together, these insights point toward a more holistic view of data quality, one that incorporates not only statistical measures but also ethical, legal, and user-centered dimensions to build AI systems that are both effective and trustworthy. |
| Bio | Professor Harmouch is a member of the Intelligent Data Engineering Lab at the Informatics Institute of the University of Amsterdam, Netherlands. Her research focus is on the field of data quality for and with machine learning. Before that, she was a Postdoc at the Information Systems Group at the Hasso Plattner Institute, University of Potsdam. During her doctoral research at the same group, she worked in the field of data profiling with the aim of developing algorithms for efficiently processing and analyzing large volumes of data. Beyond research, Professor Harmouch is also interested in keeping up with the latest developments in machine learning, data management, and integration. She is also constantly looking for new collaboration opportunities! Outside of work, she enjoys travelling, reading, cooking and training. |
Wednesday, December 10, 2025 at 10:30 a.m. (Note the unusual day)
| Title | Enabling Fast and Correct Approximate Query Processing in HTAP Systems |
| Speaker | Zhuoyue Zhao, Department of Computer Science and Engineering, University at Buffalo |
| Abstract | Approximate Query Processing enables users to trade slight loss of accuracy for very low query latencies. For today’s Hybrid Transactional/Analytical Processing workloads, this could be very useful to replace some of the expensive analytical queries if approximation is acceptable. However, traditional AQP systems rely on scan-based random sampling and thus still incur high latencies. Meanwhile, many AQP algorithms rely on specialized sampling indexes to perform random sampling without excessively scanning, but they are often not concurrency safe or updatable. In this talk, I will present our recent work on a fast and concurrency-safe updatable sampling indexes for independent range sampling. It can sustain high rate of ingestion and sampling under snapshot isolation. It is fully integrated in PostgreSQL and we also built a new AQP extension around it. I will also discuss several challenges and promising directions for AQP in modern in-memory HTAP systems. |
| Bio | Zhuoyue Zhao is currently an assistant professor at University at Buffalo. He holds a PhD degree from University of Utah, where he was advised by Prof. Feifei Li and Prof. Jeff Phillips. His research interest is in database systems, specifically query processing and optimization, transaction processing, and storage and indexing. He received an NSF CAREER award in 2024, and two SIGMOD best paper awards in 2016 and 2025. |
Monday, January 12, 2026 at 10:30 a.m.
| Title | |
| Speaker | |
| Abstract | |
| Bio |
Monday, April 13, 2026 at 10:30 a.m.
| Title | |
| Speaker | |
| Abstract | |
| Bio |
Monday, May 11, 2026 at 10:30 a.m.
| Title | |
| Speaker | |
| Abstract | |
| Bio |
Monday, June 15, 2026 at 10:30 a.m.
| Title | |
| Speaker | |
| Abstract | |
| Bio |
Monday, July 20, 2026 at 10:30 a.m.
| Title | |
| Speaker | |
| Abstract | |
| Bio |
Monday, August 17, 2026 at 10:30 a.m.
| Title | |
| Speaker | |
| Abstract | |
| Bio |