Master’s Research Paper Presentation • Cryptography, Security, and Privacy (CrySP) • Improving Function-Level Vulnerability Detection with Loss Adjustment and Per-CWE Training | Cheriton School of Computer Science

Friday, December 6, 2024 3:00 pm - 4:00 pm EST (GMT -05:00)

Please note: This master’s research paper presentation will take place in DC 3102 and online.

Yuchen Pan, Master’s candidate
David R. Cheriton School of Computer Science

Supervisor: Professor Meng Xu

Machine Learning (ML)-based vulnerability detection has become increasingly important as software systems grow in complexity. However, existing function-level approaches are often hindered by the substantial noise present in publicly available datasets, which arises from automated data collection methods. This paper try to address this challenge by proposing the Uniform Positive Loss Adjustment (UPLA) method, which adjusts the loss for positively labeled data during training, mitigating the influence of mislabeled samples. Additionally, we explore Per-CWE training, where separate models are trained for distinct categories of vulnerabilities based on the Common Weakness Enumeration (CWE) system.

We evaluate the effectiveness of UPLA and Per-CWE training under various data compositions upon the BigVul dataset. Results show that UPLA consistently improves performance metrics such as F1 score and Matthews correlation coefficient (MCC) compared to baseline methods. While Per-CWE training does not outperform general-purpose models in our experiments, we observe that its performance deteriorates when data is scarce. Moreover, we emphasize the importance of including the after version of modified functions from vulnerability-fix commits (F2 functions) in datasets to avoid overestimating model performance. These findings provide insights into mitigating data quality issues and improving the training of machine learning models for function level vulnerability detection.

To attend this master’s research paper presentation in person, please go to DC 3102. You can also attend virtually on Zoom.

Location Information

Location Address: DC - William G. Davis Computer Research Centre
200 University Avenue West
Hybrid: DC 3102 | Online master’s research paper presentation
Waterloo, ON, CA N2L 3G1

Location coordinates: