Drone Team Capture

../../_images/drone_team_capture.gif

This environment is part of the Continuous environments. Please read that page first for general information.

Possible Agents

(‘0’, ‘1’, ‘2’)

Action Spaces

{‘0’: Box(-0.31415927, 0.31415927, (1,), float32), ‘1’: Box(-0.31415927, 0.31415927, (1,), float32), ‘2’: Box(-0.31415927, 0.31415927, (1,), float32)}

Observation Spaces

{‘0’: Box(-1.0, 1.0, (12,), float32), ‘1’: Box(-1.0, 1.0, (12,), float32), ‘2’: Box(-1.0, 1.0, (12,), float32)}

Symmetric

True

Import

posggym.make("DroneTeamCapture-v0")

The Drone Team Capture Environment.

A co-operative 2D continuous world problem involving multiple pursuer drone agents working together to catch a target agent in the environment.

This is an adaption of the original code base to add partially observability. Specifically, it is possible to limit the observation range of the pursuers (see observation_limit argument) and also how many other pursuers each pursuer can observe at a time, i.e.how many in range pursuers each pursuer can communicate with (see n_communicating_pursuers argument). Furthermore we extent the observation space of each agent compared to the original code by making it so each pursuer also observes their own (x, y) position.

By default, all pursuers can observe their own angle/yaw and angular velocity, as well as the relative position of the target and all other pursuers.

Possible Agents

Varied number (1-8)

State Space

Each state consists of:

  1. 2D array containing the state of each pursuer

  2. 2D array containing the previous state of each pursuer

  3. 1D array containing the state of the target

  4. 1D array containing the previous state of the target

  5. The velocity of the target (between 0.5 and 2.0 time max velocity of pursuers)

The state of each pursuer and the target are a 1D array containing their:

  • (x, y) coordinate

  • angle/yaw (in radians)

  • velocity in x and y directions

  • angular velocity (in radians)

Action Space

Each agent has either 1 or 2 actions. If ‘velocity_control=False’ then the agent can only control their angular velocity. If ‘velocity_control=True’ then the agent has two actions, which are the angular and linear velocity, in that order.

Observation Space

Each agent receives a 1D vector observation containing information about their current state as well as some information about the other pursuers and the target.

  • Self obs - observe angle that pursuer is facing, current angular velocity, and current (x, y) position.

  • Target obs - observe the angle and distance to the target (if target is within observation_limit), as well as rate of change of angle and distance to the target.

  • Other pursuer obs - observes the angle and distance to the n_communicating_pursuers closest pursuer agents (if they are within observation_limit distance).

All observation features have values normalized into [-1, 1] interval. Observation feature values are -1 for any pursuers or the target if they are out of range observation range.

This table enumerates the observation space:

Index: start

Description

Values

0

Agent angle

[-1, 1[

1

Agent angular velocity

[-1, 1]

2

Agent (x, y) position

[-1, 1]

4

Angle to target

[-1, 1]

5

Distance to target

[-1, 1]

6

Angular velocity of angle to target

[-1, 1]

7

Velocity of distance to target

[-1, 1]

8 to (8 + 2 * n)

Other pursuer angle and distance

[-1, 1]

Where n = n_communicating_pursuers.

Rewards

Each pursuer will receive a capture reward and a reward based on their distance from the target. On successful capture, the capturing pursuer will receive a reward of +130, while other agents will receive 100.

Optionally, the pursuers receive a reward based on the Q parameter which measures the spread of the pursuers and incentivizes them to spread out around the target.

Dynamics

Actions of the pursuer agents are deterministic and consist of moving based on the angular and linear velocity.

The target has a maximum velocity that varies per episodes and it actively moves away from the pursuers and also the walls.

Starting State

Target will start near the outside of the circle, while pursuers will start in a line on the middle.

The velocity of the target is chosen uniformly at random at the start of each episode to be between 0.5 and 2.0 times the max velocity of the pursuers.

Episodes End

Episodes ends when the target has been captured. By default a max_episode_steps limit of 100 steps is also set. This may need to be adjusted when using larger world sizes (this can be done by manually specifying a value for max_episode_steps when creating the environment with posggym.make).

Arguments

  • num_agents - The number of agents which exist in the environment Must be between 1 and 8 (default = 3)

  • n_communicating_pursuers - The maximum number of agents which an   agent can receive information from. If Nonethen this will be set to equal  num_agents - 1(default =None`)

  • arena_radius - Size of the arena, in terms of it’s radius (default = 430)

  • observation_limit - The limit of which agents can see other agents, if None then there is no observation limit (default = None)

  • velocity_control - If the agents have control of their linear velocity (default = False)

  • capture_radius - Distance from target pursuer needs to be within to capture target. As per original paper, the user can adjust this to set a learning curriculum (larger values are easier) (default = 30, which is the radius of the target).

  • use_q_reward - Whether the pursuers should also receive the reward based on the Q parameter (default = False)

Available variants

For example to use the Drone Team Capture environment with 8 pursuer drones, with communication between max 4 closest drones and episode step limit of 100, and the default values for the other parameters (velocity_control, arena_radius, observation_limit, capture_radius) you would use:

import posggym
env = posggym.make(
    'DroneTeamCapture-v0',
    max_episode_steps=100,
    num_agents=8,
    n_communicating_pursuers=4,
)

Version History

  • v0: Initial version

Reference

  • C. de Souza, R. Newbury, A. Cosgun, P. Castillo, B. Vidolov and D. Kulić, “Decentralized Multi-Agent Pursuit Using Deep Reinforcement Learning,” in IEEE Robotics and Automation Letters, vol. 6, no. 3, pp. 4552-4559, July 2021, doi: 10.1109/LRA.2021.3068952.