All Years

Subscribe to All Years

Understanding this hidden structure could help us visualize data, remove noise, compare examples, and build machine-learning systems that are faster, more reliable, and easier to understand.

In this project, we will try to answer: When can we discover the hidden shape of data accurately and efficiently?

This is a difficult problem. In the most general setting, learning the full shape may require a very large amount of data and computation. Real data are also noisy, so observations may not lie exactly on a clean surface. Even deciding how many underlying dimensions the data have can be challenging.

Tags: Python, Basic Programming, Linear Algebra, Calculus, Statistics, Machine Learning, Optimization, All Years

This project wants to answer: Why does sparse regression often work well in practice, even when the usual theoretical assumptions do not clearly apply?

We will study this question using ideas from geometry, statistics, and optimization. Here, geometry means thinking about variables as directions or points in space. For example, two variables that contain almost the same information can be viewed as pointing in nearly the same direction. This viewpoint may help us understand when sparse regression makes reliable predictions, when it selects meaningful variables, and when its answer is unstable.

Tags: Python, Basic Programming, Linear Algebra, Statistics, Calculus, Optimization, Machine Learning, All Years

LLM-based agents are now used to write tests and fix bugs in real software. They sometimes succeed but when they fail, we usually have no idea why. Every agent leaves a full step-by-step log of what it did, called a trajectory. Many of these logs are now public, but almost no one has sat down and studied them carefully. This project analyzes those logs to understand how agents actually work in generating tests, where they get stuck, and what makes some tasks harder than others. This matters because developers are starting to trust these tools with real work. If we understand how and when they fail, we can build better tools and know when their output needs a second look. Recent public benchmarks like SWT Bench and SWE Atlas, built on real open-source projects, release these trajectories openly, so the data is ready to use.

Tags: Basic Programming, Python, Artificial Intelligence, All Years

This project explores how AI can help understand bilingual doctor–patient conversations and automatically generate accurate medical documentation. It has the potential to improve healthcare accessibility and reduce documentation workload for clinicians serving multilingual populations. We have already build 280 hours speech corpus containing code-switched Kazakh-Russian medical data. We now collecting an additional 100 hours of simulated doctor and patient conversations to improve model performance.

Tags: Basic Programming, Python, Artificial Intelligence, Machine Learning, Natural Language Processing, Data Science, All Years

Modern AI and machine learning systems are increasingly trained and deployed on distributed infrastructures consisting of multiple servers working together. While distributed computing enables larger models and faster processing, it also introduces new security challenges. Communication between nodes, shared resources, and distributed coordination mechanisms can create vulnerabilities that may not exist in single-machine systems. The goal of this project is to understand and evaluate security risks that arise when training or running AI/ML models in distributed environments. By identifying and studying these vulnerabilities, we can help build more secure and trustworthy AI systems.

Tags: Networks, Operating Systems, Artificial Intelligence, Machine Learning, Security, Systems, All Years

For secure multiparty computation (MPC), our goal is for parties 1 to n to securely compute f(x1, …, xn) where xi is the private input of party i. Our security condition is for the messages each party sends and receives during the computation of f to reveal no more information than its input and output. This allows the parties to collaboratively compute a function over their private inputs while maintaining privacy.

Traditionally, MPC algorithms have a fixed runtime that depends only on input size rather than the specific input since otherwise the runtime would leak information about the private input. However, for non-private algorithms, there are practical algorithms with a runtime that is both random and low in expectation. One example that has been successfully adapted to the MPC setting is quicksort, which is an algorithm whose random runtime is independent of the input list. Our goal in this project is to adapt another algorithm with random runtime that is independent of the specific input and benchmark it against private deterministic versions of the same algorithm. A successful implementation could enable adaptation of richer algorithm classes to the private setting.

Tags: Algorithms, Statistics, Security, All Years

Many non-private implementations of algorithms often access data structures at indices determined at runtime. Since such indices are determined by the input, revealing such indices would compromise privacy according to our definition. While there are asymptotically efficient solutions to adapt these algorithms to the MPC model, these solutions use generic constructions, and the constant factors make using them impractical.

Tags: Data Structures, Algorithms, Security, All Years

One primitive used to implement MPC algorithms is function secret sharing, which is a way to split a function f among multiple parties such that each party can evaluate f on a common input x and obtain shares of the output f(x). We investigate the use of function secret sharing to implement sorting algorithms in MPC since sorting is a common subroutine in many algorithms. We then benchmark these implementations against state-of-the-art private sorting algorithms.

Tags: Algorithms, Cryptography, C/C++, Security, All Years

Healthcare data can reveal important insights that improve patient care, but analyzing it is challenging. Analysts must explore complex datasets, generate and test hypotheses, and interpret results carefully. While Generative AI can assist by creating code, visualizations, and insights, it does not always understand users' goals and can sometimes produce unreliable results. This project explores how teams of AI agents can collaborate with humans to support healthcare data analysis. We will design new interaction techniques that help people communicate their intent, understand how AI-generated results were produced, and assess whether those results are trustworthy. By making human-AI collaboration more transparent and reliable, this research aims to help healthcare professionals gain insights from data more effectively and make better-informed decisions.

Tags: Web Development, Data Analysis, Human Computer Interaction (HCI), Artificial Intelligence, All Years

This project aims to enhance a research platform for creating and analyzing interactive, web-based data visualization studies by adding an eye-tracking analysis toolkit. Eye-tracking can help researchers understand where users focus, how they analyze problems, and how they make decisions while interacting with websites and data visualizations. However, analyzing gaze data often requires expensive commercial software. This project aims to address that challenge by developing an open and accessible toolkit for analyzing common gaze measures from recorded user studies. By simplifying gaze analysis, the toolkit could support the development of adaptive visualization systems that respond to users’ needs and difficulties.

Tags: Human Computer Interaction (HCI), Python, React, All Years

All Years

Project 19 - Finding the Hidden Shape of Complex Data

Project 18 - Why Sparse Regression Works?

Project 12 - How do AI coding agents write tests, and when do they fail?

Project 11 - Building AI Systems that Understand Doctor–Patient Conversations

Project 10 - Security analysis of distributed AI/ML systems

Project 9 - Secure Algorithms with Random Runtime

Project 8 - Oblivious Data Structures for Secure Computation

Project 7 - Secure Sorting Using Functional Secret Sharing

Project 6 - Multi-Agent AI for Healthcare Data Sensemaking

Project 5 - Building a Gaze Analysis Toolkit for Accessible Web-Based Eye-Tracking Studies in ReVISitBench