PhD Seminar • Bioinformatics • BarcodeBERT: Transformers for Biodiversity AnalysisExport this event to calendar

Monday, December 4, 2023 — 3:00 PM to 4:00 PM EST

Please note: This PhD seminar will take place in DC 2310 and online.

Pablo Millán Arias, PhD candidate
David R. Cheriton School of Computer Science

Supervisor: Professor Lila Kari

Understanding biodiversity is a global challenge, in which DNA barcodes — short snippets of DNA that cluster by species — play a pivotal role. In particular, invertebrates, a highly diverse and under-explored group, pose unique taxonomic complexities. We explore machine learning approaches, comparing supervised CNNs, fine-tuned foundation models, and a DNA barcode-specific masking strategy across datasets of varying complexity. While simpler datasets and tasks favor supervised CNNs or fine-tuned transformers, challenging species-level identification demands a paradigm shift towards self-supervised pretraining.

We propose BarcodeBERT, the first self-supervised method for general biodiversity analysis, leveraging a 1.5 M invertebrate DNA barcode reference library. This work highlights how dataset specifics and coverage impact model selection, and underscores the role of self-supervised pretraining in achieving high-accuracy DNA barcode-based identification at the species and genus level. Indeed, without the fine-tuning step, BarcodeBERT pretrained on a large DNA barcode dataset outperforms DNABERT and DNABERT-2 on multiple downstream classification tasks.

Full paper available at https://arxiv.org/abs/2311.02401.


To attend this PhD seminar in person, please go to DC 2310. You can also attend virtually using Zoom at https://uwaterloo.zoom.us/j/92398875467.

Location 
DC - William G. Davis Computer Research Centre
Hybrid: DC 2310 | Online PhD seminar
200 University Avenue West

Waterloo, ON N2L 3G1
Canada
Event tags 

S M T W T F S
28
29
30
31
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
1
2
  1. 2024 (52)
    1. April (2)
    2. March (1)
    3. February (24)
    4. January (25)
  2. 2023 (296)
    1. December (20)
    2. November (28)
    3. October (15)
    4. September (25)
    5. August (30)
    6. July (30)
    7. June (22)
    8. May (23)
    9. April (32)
    10. March (31)
    11. February (18)
    12. January (22)
  3. 2022 (245)
  4. 2021 (210)
  5. 2020 (217)
  6. 2019 (255)
  7. 2018 (217)
  8. 2017 (36)
  9. 2016 (21)
  10. 2015 (36)
  11. 2014 (33)
  12. 2013 (23)
  13. 2012 (4)
  14. 2011 (1)
  15. 2010 (1)
  16. 2009 (1)
  17. 2008 (1)