Drone Team Capture
This environment is part of the Continuous environments. Please read that page first for general information.
Possible Agents |
(‘0’, ‘1’, ‘2’) |
Action Spaces |
{‘0’: Box(-0.31415927, 0.31415927, (1,), float32), ‘1’: Box(-0.31415927, 0.31415927, (1,), float32), ‘2’: Box(-0.31415927, 0.31415927, (1,), float32)} |
Observation Spaces |
{‘0’: Box(-1.0, 1.0, (12,), float32), ‘1’: Box(-1.0, 1.0, (12,), float32), ‘2’: Box(-1.0, 1.0, (12,), float32)} |
Symmetric |
True |
Import |
|
The Drone Team Capture Environment.
A co-operative 2D continuous world problem involving multiple pursuer drone agents working together to catch a target agent in the environment.
This is an adaption of the original code base to add partially observability.
Specifically, it is possible to limit the observation range of the pursuers
(see observation_limit argument) and also how many other pursuers each pursuer
can observe at a time, i.e.how many in range pursuers each pursuer can communicate
with (see n_communicating_pursuers argument). Furthermore we extent the
observation space of each agent compared to the original code by making it so each
pursuer also observes their own (x, y) position.
By default, all pursuers can observe their own angle/yaw and angular velocity, as well as the relative position of the target and all other pursuers.
Possible Agents
Varied number (1-8)
State Space
Each state consists of:
2D array containing the state of each pursuer
2D array containing the previous state of each pursuer
1D array containing the state of the target
1D array containing the previous state of the target
The velocity of the target (between 0.5 and 2.0 time max velocity of pursuers)
The state of each pursuer and the target are a 1D array containing their:
(x, y)coordinateangle/yaw (in radians)
velocity in x and y directions
angular velocity (in radians)
Action Space
Each agent has either 1 or 2 actions. If ‘velocity_control=False’ then the agent can only control their angular velocity. If ‘velocity_control=True’ then the agent has two actions, which are the angular and linear velocity, in that order.
Observation Space
Each agent receives a 1D vector observation containing information about their current state as well as some information about the other pursuers and the target.
Self obs - observe angle that pursuer is facing, current angular velocity, and current (x, y) position.
Target obs - observe the angle and distance to the target (if target is within
observation_limit), as well as rate of change of angle and distance to the target.Other pursuer obs - observes the angle and distance to the
n_communicating_pursuersclosest pursuer agents (if they are withinobservation_limitdistance).
All observation features have values normalized into [-1, 1] interval. Observation
feature values are -1 for any pursuers or the target if they are out of range
observation range.
This table enumerates the observation space:
Index: start |
Description |
Values |
|---|---|---|
0 |
Agent angle |
[-1, 1[ |
1 |
Agent angular velocity |
[-1, 1] |
2 |
Agent (x, y) position |
[-1, 1] |
4 |
Angle to target |
[-1, 1] |
5 |
Distance to target |
[-1, 1] |
6 |
Angular velocity of angle to target |
[-1, 1] |
7 |
Velocity of distance to target |
[-1, 1] |
8 to (8 + 2 * n) |
Other pursuer angle and distance |
[-1, 1] |
Where n = n_communicating_pursuers.
Rewards
Each pursuer will receive a capture reward and a reward based on their distance from
the target. On successful capture, the capturing pursuer will receive a reward of
+130, while other agents will receive 100.
Optionally, the pursuers receive a reward based on the Q parameter which measures
the spread of the pursuers and incentivizes them to spread out around the target.
Dynamics
Actions of the pursuer agents are deterministic and consist of moving based on the angular and linear velocity.
The target has a maximum velocity that varies per episodes and it actively moves away from the pursuers and also the walls.
Starting State
Target will start near the outside of the circle, while pursuers will start in a line on the middle.
The velocity of the target is chosen uniformly at random at the start of each episode to be between 0.5 and 2.0 times the max velocity of the pursuers.
Episodes End
Episodes ends when the target has been captured. By default a max_episode_steps
limit of 100 steps is also set. This may need to be adjusted when using larger
world sizes (this can be done by manually specifying a value for
max_episode_steps when creating the environment with posggym.make).
Arguments
num_agents- The number of agents which exist in the environment Must be between 1 and 8 (default =3)n_communicating_pursuers - The maximum number of agents which an agent can receive information from. IfNonethen this will be set to equalnum_agents - 1(default =None`)arena_radius- Size of the arena, in terms of it’s radius (default =430)observation_limit- The limit of which agents can see other agents, ifNonethen there is no observation limit (default =None)velocity_control- If the agents have control of their linear velocity (default =False)capture_radius- Distance from target pursuer needs to be within to capture target. As per original paper, the user can adjust this to set a learning curriculum (larger values are easier) (default =30, which is the radius of the target).use_q_reward- Whether the pursuers should also receive the reward based on theQparameter (default =False)
Available variants
For example to use the Drone Team Capture environment with 8 pursuer drones, with
communication between max 4 closest drones and episode step limit of 100, and the
default values for the other parameters (velocity_control, arena_radius,
observation_limit, capture_radius) you would use:
import posggym
env = posggym.make(
'DroneTeamCapture-v0',
max_episode_steps=100,
num_agents=8,
n_communicating_pursuers=4,
)
Version History
v0: Initial version
Reference
C. de Souza, R. Newbury, A. Cosgun, P. Castillo, B. Vidolov and D. Kulić, “Decentralized Multi-Agent Pursuit Using Deep Reinforcement Learning,” in IEEE Robotics and Automation Letters, vol. 6, no. 3, pp. 4552-4559, July 2021, doi: 10.1109/LRA.2021.3068952.