Basic Programming

Subscribe to Basic Programming

Understanding how datasets in a massive data lake relate to one another is incredibly difficult because documentation is usually outdated. Instead of looking at static, broken files, this research takes a radical approach: it discovers data relationships by analyzing how people actually use the data. Solving this reduces manual data cleaning and helps organizations automatically map out how their data flows and changes in real time.

Tags: 1st years, Experienced 1st years, 2nd year +, 3rd year +, Data Analysis, Basic Programming

Data lakes contain massive amounts of structured tables and unstructured images. Currently, database systems cannot connect them automatically; a table of store products and a folder of product images remain completely isolated. This research bridges that gap using vision-language AI models to map text, data columns, and images into a shared space. Solving this unlocks "dark data," allowing organizations to fully search, analyze, and use all their visual and text data together for the first time.

Tags: 2nd year +, 3rd year +, Linear Algebra, Data Structures, Basic Programming

Understanding this hidden structure could help us visualize data, remove noise, compare examples, and build machine-learning systems that are faster, more reliable, and easier to understand.

In this project, we will try to answer: When can we discover the hidden shape of data accurately and efficiently?

This is a difficult problem. In the most general setting, learning the full shape may require a very large amount of data and computation. Real data are also noisy, so observations may not lie exactly on a clean surface. Even deciding how many underlying dimensions the data have can be challenging.

Tags: Python, Basic Programming, Linear Algebra, Calculus, Statistics, Machine Learning, Optimization, All Years

This project wants to answer: Why does sparse regression often work well in practice, even when the usual theoretical assumptions do not clearly apply?

We will study this question using ideas from geometry, statistics, and optimization. Here, geometry means thinking about variables as directions or points in space. For example, two variables that contain almost the same information can be viewed as pointing in nearly the same direction. This viewpoint may help us understand when sparse regression makes reliable predictions, when it selects meaningful variables, and when its answer is unstable.

Tags: Python, Basic Programming, Linear Algebra, Statistics, Calculus, Optimization, Machine Learning, All Years

This project asks: can we use visual design to let people navigate information at their own depth? The core idea is progressive disclosure through visual cues. Specifically, using symbols, icons, and glyphs to signal that more detail exists, and revealing that detail only when someone expresses interest (by clicking, hovering, or zooming in). Think of it like a map: at a distance, you see city names; as you zoom in, streets appear; closer still, individual buildings. We want to apply that same logic to arbitrary information.

Tags: Basic Programming, Human Computer Interaction (HCI), Visualization, 3rd Year+

Attention-deficit/hyperactivity disorder (ADHD) affects an estimated 5–10% of children worldwide. Yet existing interventions — medication and clinic-based therapy — remain costly and difficult to access for many families. Neurofeedback training is a non-pharmacological approach with a growing evidence base, but it is currently available almost exclusively in clinical settings.

Our research asks: What should an at-home attention training system look like for families of children with ADHD? We are designing a system that combines an EEG headset, tangible interactive hardware, and gamified training experiences — one that children actually want to use, that parents can meaningfully participate in, and that makes training progress visible and trackable.

Tags: Basic Programming, Figma, Human Computer Interaction (HCI), Psychology, 2nd Year +

LLM-based agents are now used to write tests and fix bugs in real software. They sometimes succeed but when they fail, we usually have no idea why. Every agent leaves a full step-by-step log of what it did, called a trajectory. Many of these logs are now public, but almost no one has sat down and studied them carefully. This project analyzes those logs to understand how agents actually work in generating tests, where they get stuck, and what makes some tasks harder than others. This matters because developers are starting to trust these tools with real work. If we understand how and when they fail, we can build better tools and know when their output needs a second look. Recent public benchmarks like SWT Bench and SWE Atlas, built on real open-source projects, release these trajectories openly, so the data is ready to use.

Tags: Basic Programming, Python, Artificial Intelligence, All Years

This project explores how AI can help understand bilingual doctor–patient conversations and automatically generate accurate medical documentation. It has the potential to improve healthcare accessibility and reduce documentation workload for clinicians serving multilingual populations. We have already build 280 hours speech corpus containing code-switched Kazakh-Russian medical data. We now collecting an additional 100 hours of simulated doctor and patient conversations to improve model performance.

Tags: Basic Programming, Python, Artificial Intelligence, Machine Learning, Natural Language Processing, Data Science, All Years

Basic Programming

Project 21 - Behavioral Lineage Synthesis

Project 20 - Multi-Modal ERD (Tables + Images)

Project 19 - Finding the Hidden Shape of Complex Data

Project 18 - Why Sparse Regression Works?

Project 17 - Visual Progressive Disclosure for Information Overload

Project 16 - Designing Gamified Attention Training for Children with ADHD

Project 12 - How do AI coding agents write tests, and when do they fail?

Project 11 - Building AI Systems that Understand Doctor–Patient Conversations