PhD Seminar • Bioinformatics • Enhancing Peptide Identification Rate using Machine Learning: Training with Retained NEXT-Ranked PSMs​

Friday, July 28, 2023 1:00 pm - 2:00 pm EDT (GMT -04:00)

Please note: This PhD seminar will take place online.

Johra Muhammad Moosa, PhD candidate
David R. Cheriton School of Computer Science

Supervisor: Professor Bin Ma

To improve the peptide identification rates in the database search analysis of bottom-up proteomics data, many proposed implementation of machine learning algorithms. These machine learning-based methods train a new scoring function after the initial search to rescore and rerank the peptide spectrum matches (PSMs). Generally, the retraining uses selected peptide-spectrum matches from the target and decoy databases as positive and negative training examples, respectively. However, this exposes the target-decoy information to the scoring function, potentially invalidating the false discovery rate (FDR) estimation.

We propose a novel method for retraining without revealing the target-decoy information. Our approach considers the top-ranked and the next-ranked peptides for the same spectrum as positive and negative examples, respectively. We demonstrate that this leads to a much-improved identification rate while maintaining accurate FDR estimation.