Caroline Colijn | Department of Mathematics, Simon Fraser University
Comparing trees and using them for prediction in pathogen sequence data
With the development of rapid, low-cost and readily available sequencing technologies, there is a need for quantitative methods to help interpret sequence datasets and relate them to the dynamics of biological systems. Trees (in the sense of graphs with no cycles) are a mainstay of how we represent and understand sequence data. I will introduce several flavours of trees with their motivating applications, and will describe new metrics -- true distance function -- on them. In particular we have a new metric derived from polynomials on (unlabelled) trees. In the second part of the talk I will focus on applying tree comparisons to the context of infectious disease: can we use trees to guess in advance which sub-populations of a circulating pathogen will succeed in the near future? I will use tree features and comparisons together with machine learning tools to make these predictions.