Candidate: Spencer Delcore
Date: April 24, 2026
Time: 11:00 AM
Location: Online
Supervisor: Krzysztof Czarnecki
All are welcome!
Abstract:
End-to-end autonomous driving systems demonstrate impressive capabilities under normal conditions but exhibit unpredictable safety degradation when perception is compromised by adverse weather, image distortions, or sensor failures. Existing safety assessment methods fail to capture how perception degradation propagates through these black-box architectures and impact driving decisions, creating a fundamental gap in deployment readiness evaluation. This thesis presents a three-activity framework for quantifying and predicting safety risk in end-to-end autonomous driving systems under conditions where poor perception significantly affects safety. Activity 1 performs data augmentation and model execution through four sub-activities: dataset creation, model evaluation, safety assessment via the Total Risk metric, and feature extraction. The Total Risk metric aggregates incident severity through three components: time-to-incident severity, distance severity, and impact severity. Activity 2 trains a machine learning predictor on spatial and temporal features extracted from auxiliary task outputs of the model to determine expected Total Risk deviations from the Total Risk under nominal conditions. Activity 3 deploys the trained predictor for real-time safety degradation detection, enabling proactive intervention strategies. Experiments across two architectures, UniAD and ST-P3, under 24 different perception-relevant conditions spanning weather, lighting, sensor failures, and image distortions, reveal several key insights for system designers. Analysis across four test suites uncovers asymmetric generalization where predictors trained on severe conditions generalize reliably to less severe conditions, while the reverse fails catastrophically. Sensor failure conditions, such as camera crashes, create fundamentally different system failure modes depending on model architecture (UniAD does not produce any detections while ST-P3 generates noisy hallucinations) requiring architecture-specific safety measures. Performing feature importance analysis demonstrates that early-stage auxiliary tasks provide the most informative degradation signals. This framework provides system designers with quantitative tools for architecture design, training data prioritization, and known deployment limitations, advancing the safety assurance of end-to-end autonomous driving systems.