PhD Seminar: Improving User Specifications for Robot Behaviour through Active Preference Learning: Framework and Evaluation

Tuesday, April 16, 2019 10:30 am - 10:30 am EDT (GMT -04:00)

Candidate: Nils Wilde

Title: Improving User Specifications for Robot Behaviour through Active Preference Learning: Framework and Evaluation

Date: April 16, 2019

Time: 10:30 AM

Place: EIT 3142

Supervisor(s): Kulic, Dana - Smith, Stephen L.

Abstract:

An important challenge in human robot interaction (HRI) is enabling non-expert users to specify complex tasks for autonomous robots. Recently, active preference learning has been applied in HRI to interactively shape a robot's behaviour. In this PhD-research we study a framework where users specify constraints on allowable robot movements on a graphical interface, yielding a robot task specification. However, users may not be able to accurately assess the impact of such constraints on the performance of a robot. Thus, we revise the specification by iteratively presenting users with alternative solutions where some constraints might be violated, and learn about the importance of the constraints from the users' choices between these alternatives.

We propose a linear deterministic model on user preferences together with a complete learning  algorithm based on iteratively querying the user. In a user study we demonstrate our framework with a material transport task in an industrial facility. We show that nearly all users accept alternative solutions and thus obtain a revised specification through the learning process that leads to a substantial improvement in robot performance. Further, the learning process reduces the variances between the specifications from different users and thus makes the specifications more similar. As a result, the users whose initial specifications had the largest impact on performance benefit the most from the interactive learning.

Finally, we extend the user model to a discrete Bayesian learning model an introduce a greedy algorithm for proposing alternatives that operates on the notion of equivalence regions of user weights. We prove that this algorithm converges to the user-optimal path for users that are non-deterministically following our cost function. In simulations on realistic industrial environments, we demonstrate the convergence and robustness of our approach.