Just say the word: New voice tech could run home appliances

Friday, January 22, 2021

Developed by researchers at the University of Waterloo and AI company DarwinAI, the technology enables the creation of low-cost, low-power, self-contained speech recognition software that is tailored to specific tasks.

Unlike existing voice assistant systems such as Amazon Echo and Google Home, the deep-learning AI software could run everything from televisions to thermostats without connections to cloud computing.

“Cost and efficiency are two of the biggest bottlenecks to the widespread adoption of machine-learning AI,” said Alexander Wong, a professor of systems design engineering at Waterloo. “This technology significantly addresses those issues and enables a new class of voice assistants for everyday devices with energy-efficiency needs.”

The breakthrough involves the use of AI algorithms to create AI speech recognition software so compact it can fit on chips that are smaller than postage stamps and cost only a few dollars to make.

That efficiency is achieved by giving AI algorithms data and specific requirements – the ability to understand the words “yes,” “no,” “on” and “off,” for instance – and instructing them to find the least complex way to meet them.

“The resulting software is just big enough to do a particular job well,” said Wong, director of the Vision and Image Processing (VIP) Research Group, and a co-founder of DarwinAI. “That is the goal, the essence of our approach.”

The efficiency and performance of the speech recognition AI are increased by new deep-learning building blocks introduced by Wong’s team. Known as attention condensers, they focus the software on the most relevant information in sound waves.

In addition to home appliances, the compact AI could affordably operate systems in vehicles and devices for people with disabilities, while also helping address privacy concerns associated with cloud-based voice assistants.

Researchers are now working to apply the core technology and their new AI building blocks to the creation of compact, stand-alone AI for visual perception and text interpretation.

A paper on their work, TinySpeech: Attention Condensers for Deep Speech Recognition Neural Networks on Edge Devices, was presented at a recent Neural Information Processing Systems workshop.

Mahmoud Famouri, a researcher at Darwin, Maya Pavlova, a Waterloo engineering student, and Siddharth Surana, a Waterloo computer science student, contributed to the research.