Learning Agent-Based MPC

In this project, we intend to introduce a novel approach to holistic vehicle control using a hybrid learning Model Predictive Control (MPC) controller. The hybrid learning MPC scheme leverages two prediction models: a physics-based nominal vehicle dynamics model and a data-driven learned vehicle dynamics model. The integration of learning techniques significantly boosts the prediction accuracy of vehicle states, enabling more precise and proactive control actions.

The optimality of Agent-based model predictive control (AMPC) highly depends on the prediction accuracy that requires all agents or their contributions to be known, which is too idealistic for actual implementation. This project proposes a novel practical hybrid control scheme - learning agent-based MPC (LAMPC), combining the model-based AMPC approach and data-based learning methods to solve the problem of any unknown agents existing in a multi-agent system. The learning module predicts unknown information from data; meanwhile, the MPC provides a closed-form interface and safety boundaries.

The proposed LAMPC scheme has the following advantages:

  • High tracking performance. The tracking performance was significantly improved in a well-learned scenario based on reliable learning results.
  • Guaranteed safety. Stochastic chance constraint is used to guarantee the safety of constraint satisfaction. The feedback assumption in uncertainty propagation and soft constraint help to improve and ensure the optimization feasibility.
  • Real-time efficiency. This study adopts the high flexibility of GPR learning. Meanwhile, data density control and subset selection ensure data quality and GPR inference efficiency.
  • Flexibility. The proposed LAMPC retains the flexibility of AMPC for agent configuration. This method can be applied to systems containing any kind or number of black-box agents.

As shown in the overall structure of the proposed LAMPC scheme, the learning process consists of data preparation, data management, and GPR learning. The predicted mean and variance from GPR will finally be used to reconstruct the model-based stochastic AMPC.

Data-based Learning Process and Model-based Stochastic AMPC

Comparison Results in Less-learned Scenario: Safety Guaranteed
A double lane change (DLC) with empty dataset of the learning module, meaning there is not enough experience for this DLC. Three controllers have been applied for the RDT agent: (a) AMPC; (b) LAMPC with normal stability constraints (NC); (c) LAMPC with soft chance constraints (SCC).
When the controllable agent was controlled by AMPC or LAMPC with NC, the vehicle’s rear slip angle and yaw rate significantly exceeded the limits, which is very dangerous in reality. However, when it was controlled by the proposed LAMPC with SCC, the rear slip angle and the yaw rate were kept within constraints, ensuring the vehicle was stable and safe.

Rear Side Slip and Yaw Rate Graphs

Comparison Results in Well-learned Scenario: Tracking Performance Improved
Three similar sinewave maneuvers with different settings were implemented for comparison: (a) the vehicle was only controlled by the driver agents; (b) All agents including a black-box agent are activated and a traditional AMPC controller was working on the controllable agent; (c) All agents including a black-box agent are activated and the proposed LAMPC with learning function is working on the controllable agent.
It is shown that when all agents are activated, and the traditional AMPC controller controls the controllable agent, the tracking performance is improved compared to the first experiment, where only the driver controls the vehicle. However, without the information of the black-box agent, there is still a significant tracking error. In the last experiment, since the proposed LAMPC controller can learn the contribution of the black-box agent from data, the measured yaw rate is always closer to the reference.

Sinewave maneuvers