Please note: This PhD seminar will take place online, NOT in person as advertised earlier.
Pablo Millán Arias, PhD candidate
David R. Cheriton School of Computer Science
Supervisor: Professor Lila Kari
This talk discusses two approaches for utilizing unlabelled genomic data for taxonomic categorization, specifically through deep-learning-based algorithms. First, we introduce an entropy-based clustering method for DNA Sequences, which employs a discriminative classifier to identify taxonomic clusters without supervision. Further, we expand upon these ideas and leverage self-supervised representation learning for enhanced non-parametric DNA sequence clustering, achieving performance comparable to traditional alignment-based methods in synthetic datasets. Our work demonstrates the integration of deep unsupervised learning with taxonomic identification, offering novel approaches for biodiversity studies.