Basic Programming

Understanding this hidden structure could help us visualize data, remove noise, compare examples, and build machine-learning systems that are faster, more reliable, and easier to understand.

In this project, we will try to answer: When can we discover the hidden shape of data accurately and efficiently?

This is a difficult problem. In the most general setting, learning the full shape may require a very large amount of data and computation. Real data are also noisy, so observations may not lie exactly on a clean surface. Even deciding how many underlying dimensions the data have can be challenging.

Tags: Python, Basic Programming, Linear Algebra, Calculus, Statistics, Machine Learning, Optimization, All Years

This project wants to answer: Why does sparse regression often work well in practice, even when the usual theoretical assumptions do not clearly apply?

We will study this question using ideas from geometry, statistics, and optimization. Here, geometry means thinking about variables as directions or points in space. For example, two variables that contain almost the same information can be viewed as pointing in nearly the same direction. This viewpoint may help us understand when sparse regression makes reliable predictions, when it selects meaningful variables, and when its answer is unstable.

Tags: Python, Basic Programming, Linear Algebra, Statistics, Calculus, Optimization, Machine Learning, All Years

In this project, we will explore how machine learning can help astronomers find and study interesting objects or events. For example, a model might be used to classify astronomical objects, identify unusual observations, detect rare events, study populations of galaxies or galaxy clusters, or uncover patterns in the shape and organization of these systems. It may also help researchers understand the different stages or components of events such as gamma-ray bursts. The exact scientific question will depend on the available datasets and discussions with collaborators in astronomy and cosmology. There are opportunities to collaborate with astrophysicists and cosmologists in institutes like Perimeter Institute and Vera Rubin Observatory in medium and/or longer term.


Tags: Python, Basic Programming, Data Structures, Algorithms, Statistics, Linear Algebra, Calculus, Machine Learning, Astronomy, All Years

This project asks: can we use visual design to let people navigate information at their own depth? The core idea is progressive disclosure through visual cues. Specifically, using symbols, icons, and glyphs to signal that more detail exists, and revealing that detail only when someone expresses interest (by clicking, hovering, or zooming in). Think of it like a map: at a distance, you see city names; as you zoom in, streets appear; closer still, individual buildings. We want to apply that same logic to arbitrary information.


Tags: Basic Programming, Human Computer Interaction (HCI), Visualization, 3rd Year+

Attention-deficit/hyperactivity disorder (ADHD) affects an estimated 5–10% of children worldwide. Yet existing interventions — medication and clinic-based therapy — remain costly and difficult to access for many families. Neurofeedback training is a non-pharmacological approach with a growing evidence base, but it is currently available almost exclusively in clinical settings.

Our research asks: What should an at-home attention training system look like for families of children with ADHD? We are designing a system that combines an EEG headset, tangible interactive hardware, and gamified training experiences — one that children actually want to use, that parents can meaningfully participate in, and that makes training progress visible and trackable.

Tags: Basic Programming, Figma, Human Computer Interaction (HCI), Psychology, 2nd Year +

LLM-based agents are now used to write tests and fix bugs in real software.  They sometimes succeed but when they fail, we usually have no idea why. Every agent leaves a full step-by-step log of what it did, called a trajectory. Many of these logs are now public, but almost no one has sat down and studied them carefully. This project analyzes those logs to understand how agents actually work in generating tests, where they get stuck, and what makes some tasks harder than others. This matters because developers are starting to trust these tools with real work. If we understand how and when they fail, we can build better tools and know when their output needs a second look. Recent public benchmarks like SWT Bench and SWE Atlas, built on real open-source projects, release these trajectories openly, so the data is ready to use.


Tags: Basic Programming, Python, Artificial Intelligence, All Years

This project explores how AI can help understand bilingual doctor–patient conversations and automatically generate accurate medical documentation. It has the potential to improve healthcare accessibility and reduce documentation workload for clinicians serving multilingual populations. We have already build 280 hours speech corpus containing code-switched Kazakh-Russian medical data.  We now collecting an additional 100 hours of simulated doctor and patient conversations to improve model performance.

Tags: Basic Programming, Python, Artificial Intelligence, Machine Learning, Natural Language Processing, Data Science, All Years