Graduate mentor's supervisor: Prof. Marina Meila
Telescopes and satellites collect enormous amounts of data about black holes, galaxy clusters, gamma-ray bursts, and other objects and events in the universe. Hidden within these data may be rare objects, unexpected patterns, or signals that could lead to new scientific discoveries. However, it is often difficult for researchers to examine all of the data by hand.
In this project, we will explore how machine learning can help astronomers find and study interesting objects or events. For example, a model might be used to classify astronomical objects, identify unusual observations, detect rare events, study populations of galaxies or galaxy clusters, or uncover patterns in the shape and organization of these systems. It may also help researchers understand the different stages or components of events such as gamma-ray bursts. The exact scientific question will depend on the available datasets and discussions with collaborators in astronomy and cosmology. There are opportunities to collaborate with astrophysicists and cosmologists in institutes like Perimeter Institute and Vera Rubin Observatory in medium and/or longer term.
A major challenge is that real astronomical data are not clean or simple. Measurements may contain noise, missing values, unusual errors, or very few examples of the objects scientists care most about. As a result, a machine-learning model may appear to work well while still making unreliable predictions. We will investigate how to build methods that are not only accurate, but also trustworthy.
Students will work in a team of 3–4 and may divide the project into complementary tasks, such as preparing data, implementing algorithms, running experiments, or interpreting results.
Short-term goals may include:
- learning about astronomical datasets;
- cleaning, organizing, and visualizing the data;
- building a simple and reproducible data-processing pipeline;
- implementing baseline machine-learning methods; and
- evaluating when the methods succeed or fail.
Medium-term goals may include:
- studying how noise, measurement error, or rare examples affect predictions;
- comparing different machine-learning or anomaly-detection methods;
- estimating how confident a model should be in its predictions; and
- improving the reliability of the data pipeline or learning algorithm.
Longer-term goals may include:
- applying the methods to larger or newly collected datasets;
- developing new algorithms for more difficult astronomical data;
- working with scientists to examine promising objects or events; and
- contributing software, experiments, or results to a broader research project.
Students will gain practical experience in machine learning, data analysis, scientific programming, teamwork, and communicating across different research areas. No prior astronomy research experience is expected.
The project is intended to be accessible to undergraduate students with a basic background in programming and mathematics. Students should ideally have:
- experience writing programs in Python or another programming language, with or without help of LLMs;
- familiarity with basic data structures and algorithms;
- introductory knowledge of probability, linear algebra and calculus; and
- an interest in machine learning, astronomy, or working with scientific data.
Previous coursework in machine learning, statistics, or astronomy would be useful, but not strictly required. Students will be introduced to the necessary machine-learning and astronomy concepts during the project.