New deep knowledge AI system interprets biosequence data in seconds

With the continued drop in genomic sequencing costs there has been an explosive growth in biosequence data which currently represents approximately 15 peta-bases with estimates of this doubling every seven months. However, the lack of sophisticated software analytical capacity to extract meaningful and useful knowledge from this data represents a bottleneck in informing new omics research initiatives. Several novel algorithms developed by Professor Andrew Wong, systems design engineering, University of Waterloo, and Founding Director of the Centre for Pattern Analysis and Machine Intelligence (CPAMI), along with research fellows Antonio Sze-To, Peiyuan Zhou, and Jieming Li, now make it possible to unlock this backlog of data by predicting the binding of biosequences in seconds. 

The novel algorithms, collectively named Pattern to Knowledge (P2K) method, disentangle multiple hidden associations in the data to identify and predict amino acid bindings that govern protein interactions. Since P2K is much faster than existing biosequence analysis software, with almost 30 per cent better prediction accuracy, it could significantly speed up the discovery of new drugs. By drawing upon pre-existing information from databases in the Cloud, P2K could, for instance, discover bio-sequences that might bind to or reveal the functional/binding regions of a given sequence. Such findings will deepen our understanding of biological systems and lead to potential cancer treatments as well as minimizing the costly and tedious laboratory tests.

“P2K is a game changer given its ability to disentangle protein associations and powerfully predict interactions based only on sequence data that cannot be extracted using traditional machine learning methods. P2K is a ‘deep knowledge’ engine that is able to disentangle and decipher relationships within pre-existing data sets which can then be coupled with the power of predictive machine learning techniques to surface deeper information about a system’s interactions. The ability to access this deep knowledge from proven scientific results will shift biological research going forward,” says Prof. Wong.

Khosrow Modarressi, technology transfer manager in the Waterloo Commercialization Office (WatCo), is working with Prof. Wong and his team to advance commercial readiness of the P2K breakthrough methodology. Patent applications in the U.S. and Canada have been filed to protect the commercial opportunity for P2K. WatCo secured NSERC Idea to Innovation (I2I) program funding to undertake an independent professional market assessment study to identify the most effective startup commercialization strategy.

Antonio Sze-To, P2K co-inventor, has also been supported to participate in Explore, the Waterloo Accelerator Centre’s commercialization and entrepreneurship program, to further explore startup launch plans. The team has launched a web server prototype that allows biomedical researchers to plug in their own data to generate real-time biosequence predictions which will serve to validate the potential commercial usefulness of P2K.

“WatCo has helped put P2K in the hands of biomedical researchers bolstering P2K’s scientific credibility and fostering P2K’s commercial readiness,” says Prof. Wong.

Since P2K analyzes other forms of sequential data, it is not limited to use in biomedical research alone. P2K could benefit the financial industry by making useful associations and predictions for smart trading or the cybersecurity sector by predicting the likelihood of potential cyberattacks. As a result, WatCo is engaging further resources to advance the commercialization of P2K.

A technical paper detailing P2K’s technology was recently published in Nature Scientific Reports.