Joint Situation Awareness and Cooperative Reinforcement Learning


The main aim of the project is to develop a model of autonomous agents that can navigate and explore a dynamic real-time environment for search-and-rescue operation. The agents are made to be cooperative in which they share their experiences and knowledge by developing Joint Situation Awareness supporting and improving each individual agent’s operation. To enable more efficient search-and-rescue operation, the overall tasks can be decomposed hierarchically in sub-goals and sub-tasks such that they can be performed in parallel across various levels of control. Given totally or partially unknown environment in the initial stage of operation, agents must learn cooperatively in which they make collaborative decisions and adapt their behavior over time across different situations and environments to keep improving the overall payoff of the team.

                                            General architecture of multi-agent search and rescue system with the situation model and Commander-Units organizational structure


Intelligent robots operating as a team can improve the efficiency of crisis response such as assisting search-and-rescue. However, the task is still challenging when the environment is partially or totally unknown and exploration must be conducted efficiently to reduce interference among the agents that may affect the overall performance. The complexity increases when the agents carrying out the operation must adapt to changing conditions or uncertainties in the environment and learn incrementally from experiences. In this project, the work is focused on search-and-rescue tasks in an enclosed environment (like building construct with walls, doors, furniture, rubble, debris, people, etc.) The task is currently scoped to be conducted by autonomous quad-copter drones as Unit agents that perform and learn to navigate and explore the environment. At the collective or multi-agent level, a hierarchical command-and-control architecture is applied that a Commander agent is analyzing the overall situation based on the input provided by the Unit level agents as they roam the environment. Based on the holistic view of the situation, the Commander allocates the tasks and direct the agents to make the entire search-and-rescue operation more efficient.

Every unit agent performs elementary tasks like navigation and survey according to the assigned target from the commander while autonomously learn to improve its performance. Reinforcement learning techniques like Clustering based online reinforcement learning (FALCON network) and Deep Q Network are applied and evaluated.

Commander agent allocates the search and rescue tasks for every unit agent while learning to better allocate in the future. Different models of reinforcement learning are applied for comparison

  • Deep Reinforcement Learning to allocate the task based on the situation model of the environment
  • Hierarchical Deep Reinforcement Learning for generation, decomposition, and discovery of reusable tasks

Deep Reinforcement Learning for task allocation                         Automatic tasks decomposition and discovery

Simulation Results

Deep reinforcement learning (RL) is applied to minimize the step taken to explore the entire environment. The input to deep RL is a pre-processed connectivity graph representing connected rooms and locations in the environment.

Average number of step (50 episodes) to visit all nodes (location) in the graph

Simulation of task allocation in search and rescue in enclosed environment by three different heterogeneous agents each has different capabilities and objectives.

Using option learning to learn how to switch or terminate one (sub)task to another.

Based on 100x100 grid world. Three different agents (Agent1, Agent2, Agent3) perform different tasks that depend on each other (e.g explore the area/map, deliver objects to a victim, relocate the victim).

Number of steps until completion of the whole main Search & Rescue task of MAHRL (Multi-Agent Hierarchical Reinforcement Learning) without termination until the task achievement, MAHRL with various fixed termination periods (every 100, 50, 10, and 5 step), and the proposed adaptive termination with Multi-Agent Option Critic (MAOC). It is shown that MAOC method can learn to come up with an efficient coordination and allocation for different agents in the search and rescue task.

Demonstration Video

Co-Principal Investigators

Prof. Tan Ah-Hwee (NTU)

Telephone: 6790 4326
Office: N4-02a-25
Mr. Xu Yan (STE)

Phone: 9895 2862
Office: S1-B4a-03
Mr. Paul Tan (STE)

Telephone: 6660 1052
Office: S1-B4a-03