Please note: This seminar will be presented in person in DC 1304 as well as streamed online.
Broňa Brejová, Department of Computer Science
Comenius University in Bratislava, Slovakia
Many successful tools in bioinformatics are based on working with k-mers, substrings of length k of the input sequences. In this talk, we will discuss two less-known areas where k-mers can be used.
The first is estimation of genome properties based on probabilistic models of k-mer abundance in sequencing reads. These models capture dependence of the abundance on various phenomena, such as the size and repeat content of the genome, heterozygosity levels, and sequencing error rate. This in turn allows us to estimate these properties from k-mer abundance histograms observed in real data. The second area is comparison of k-mer abundance between related sequencing samples and meaningfully summarizing the results, to discover which parts of the genome were depleted or enriched in one of the samples.
In both cases, approaches based on k-mers allow to study genomic regions that are difficult for genome assembly and read mapping due to high repeat content or other unusual properties.
Joint work with Askar Gafurov and Tomáš Vinař.