Tuesday, July 25, 2023 8:00 am
-
9:00 am
EDT (GMT -04:00)
Abstract
The interactions between humanoid robots and humans is a growing area of research, as frameworks and models are being continuously developed to improving the ways in which humanoids may integrate into society. These humanoids often require intelligence beyond what they are originally endowed with in order to handle more complex human-robot interaction scenarios. This intelligence can come from the use of additional sensors, including microphones and cameras, which can allow the robot to better perceive its environment. This thesis explores the scenarios of moving conversational partners, and the ways in which the REEM-C Humanoid Robot may interact with them. The additional developed intelligence focuses on external microphones deployed to the robot, with a consideration for computer vision algorithms built using the camera in the REEM-C's head.
The first topic of this thesis explores how binaural acoustic intelligence can be used to estimate the direction of arrival of human speech on the REEM-C Humanoid. This includes the development of audio signal processing techniques, their optimization, and their deployment for real-time use on the REEM-C.
The second topic highlights the computer vision approaches that can be used for a robotic system that may allow better human-robot interaction. This section describes the relevant algorithms and their development, in a way that is efficient and accurate for real-time robot usage.
The third topic explores the natural behaviours of humans in conversation with moving interlocutors. This is measured via a motion capture study and modeled with mathematical formulations, which are then used on the REEM-C Humanoid Robot. The REEM-C uses this tracking model to follow detected human speakers using the intelligence outlined in previous sections.
The final topic focuses on how the acoustic intelligence, vision algorithms and tracking model can be used in tandem for human-robot interaction with potentially multiple human subjects. This includes sensor fusion approaches that help correct for limitations in the audio and video algorithms, synchronization and evaluation of behaviour in the form of a short user study. Applications of this framework are discussed, and relevant quantitative and qualitative results are presented.
A chapter to introduce the work done to establish a chatbot conversational system is also included.
The final thesis work is an amalgamation of the above topics, and presents a complete and robust human-robot interaction framework with the REEM-C based on tracking moving conversational partners with audio and video intelligence.
The first topic of this thesis explores how binaural acoustic intelligence can be used to estimate the direction of arrival of human speech on the REEM-C Humanoid. This includes the development of audio signal processing techniques, their optimization, and their deployment for real-time use on the REEM-C.
The second topic highlights the computer vision approaches that can be used for a robotic system that may allow better human-robot interaction. This section describes the relevant algorithms and their development, in a way that is efficient and accurate for real-time robot usage.
The third topic explores the natural behaviours of humans in conversation with moving interlocutors. This is measured via a motion capture study and modeled with mathematical formulations, which are then used on the REEM-C Humanoid Robot. The REEM-C uses this tracking model to follow detected human speakers using the intelligence outlined in previous sections.
The final topic focuses on how the acoustic intelligence, vision algorithms and tracking model can be used in tandem for human-robot interaction with potentially multiple human subjects. This includes sensor fusion approaches that help correct for limitations in the audio and video algorithms, synchronization and evaluation of behaviour in the form of a short user study. Applications of this framework are discussed, and relevant quantitative and qualitative results are presented.
A chapter to introduce the work done to establish a chatbot conversational system is also included.
The final thesis work is an amalgamation of the above topics, and presents a complete and robust human-robot interaction framework with the REEM-C based on tracking moving conversational partners with audio and video intelligence.
Presenter
Pranav Barot, MASc candidate in Systems Design Engineering