Cheriton School of Computer Science Professor and University Research Chair Lila Kari has spearheaded the development of a software tool that can provide conclusive answers to some of the most fascinating questions in evolutionary biology.
The tool, known as ML-DSP, which combines supervised machine learning with digital signal processing, could for the first time make it possible to definitively answer questions such as how many species exist on land and in the oceans. How existing, newly discovered, and extinct species are related to each other? What are the bacterial origins of mitochondrial DNA? Does the DNA of a parasite and its host have a similar genomic signature?
The tool also has the potential to improve the efficacy of personalized medicine by identifying specific strains of a virus, thus allowing precise drugs to be developed and administered to treat the viral infection.
ML-DSP is an alignment-free software tool that works by transforming a DNA sequence into a digital numerical signal, and uses digital signal processing methods to process and distinguish these signals from each other.
“With this method even if we only have small fragments of DNA we can still classify DNA sequences, regardless of their origin, or whether they are natural, synthetic, or computer-generated,” said Professor Kari. “Another important potential application of this tool is in the healthcare sector, as in this era of personalized medicine we can classify viruses and customize the treatment of a particular patient depending on the specific strain of the virus that affects them.”
In the study, the researchers performed a quantitative comparison with other state-of-the-art classification software tools on two small benchmark datasets and one large datatset with 4,322 vertebrate mitochondrial DNA sequences.
“Our results show that ML-DSP overwhelmingly outperforms alignment-based software in terms of processing time, while having classification accuracies that are comparable in the case of small datasets and superior in the case of large datasets,” said Kari. “Compared with other alignment-free software, ML-DSP has significantly better classification accuracy and is overall faster.”
The authors also conducted preliminary experiments indicating the potential of ML-DSP to be used for other datasets, by classifying 4,271 complete dengue virus genomes into subtypes with 100 per cent accuracy, and 4,710 bacterial genomes into divisions with 95.5 per cent accuracy.
The paper detailing the new software tool, titled ML-DSP: Machine Learning with Digital Signal Processing for ultrafast, accurate, and scalable genome classification at all taxonomic levels, was authored by Professor Kari together with Western University PhD candidate Gurjit Randhawa and Associate Professor Kathleen Hill in the Department of Biology at Western University. The paper was published recently in the journal BMC Genomics.