Predator Prey Continuous
This environment is part of the Continuous environments. Please read that page first for general information.
Possible Agents |
(‘0’, ‘1’) |
Action Spaces |
{‘0’: Box([-0.7853982 0. ], [0.7853982 1. ], (2,), float32), ‘1’: Box([-0.7853982 0. ], [0.7853982 1. ], (2,), float32)} |
Observation Spaces |
{‘0’: Box(0.0, 4.0, (48,), float32), ‘1’: Box(0.0, 4.0, (48,), float32)} |
Symmetric |
True |
Import |
|
The Continuous Predator-Prey Environment.
A co-operative 2D continuous world problem involving multiple predator agents working together to catch prey agent/s in the environment.
Possible Agents
Varied number
State Space
Each state consists of:
2D array containing the state of each predator
2D array containing the state of each prey
1D array containing whether each prey has been caught or not (0=no, 1=yes)
The state of each predator and prey is a 1D array containing their:
(x, y)coordinateangle/yaw (in radians)
velocity in x and y directions
angular velocity (in radians)
Action Space
Each agent’s actions is made up of two parts. The first action component specifies
the angular velocity in [-pi/10, pi/10], and the second component specifies the
linear velocity in [0, 1].
Observation Space
Each agent observes a local circle around themselves as a vector. This is achieved by a series of ‘n_sensors’ lines starting at the agent which extend for a distance of ‘obs_dist’. For each line the agent observes the closest entity (wall, predator, prey) along the line. This table enumerates the observation space:
Index: [start, end) |
Description |
Values |
|---|---|---|
0 - n_sensors |
Wall distance for each sensor |
[0, d] |
n_sensors - (2 * n_sensors) |
Predator distance for each sensor |
[0, d] |
(2 * n_sensors) - (3 * n_sensors) |
Prey distance for each sensor |
[0, d] |
Where d = obs_dist.
If an entity is not observed (i.e. there is none along the sensor’s line or it
isn’t the closest entity to the observing agent along the line), The distance will
be obs_dist.
The sensor reading ordering is relative to the agent’s direction. I.e. the values
for the first sensor at indices 0, n_sensors, 2*n_sensors correspond to the
distance reading to a wall/obstacle, predator, and prey, respectively, in the
direction the agent is facing.
Rewards
There are two modes of play:
Fully cooperative: All predators share a reward and each agent receives a reward of
1.0 / num_preyfor each prey capture, independent of which predator agent/s were responsible for the capture.Mixed cooperative: Predators only receive a reward if they were part of the prey capture, receiving
1.0 / num_preyper capture.
In both modes prey can only been captured when at least prey_strength
predators are in adjacent cells, where 1 <= prey_strength <= num_predators.
Dynamics
Actions of the predator agents are deterministic and consist of moving based on their angle and linear velocity.
Prey move according to the following rules (in order of priority):
if a predator is within
prey_obs_distdistance then moves away from closest predatorif another prey is within
prey_obs_distdistance then moves away from closest preyelse move randomly
Where prey_obs_dist = 0.9 * obs_dist, meaning predators are able to see slightly
further than prey. If a prey is caught it is removed from the world (i.e. it’s
coords become (-1, -1)).
Starting State
Predators start from random separate locations along the edge of the world (either in a corner, or half-way along a side), while prey start at random locations around the middle of the world.
Episodes End
Episodes ends when all prey have been captured. By default a max_episode_steps
limit of 100 steps is also set. This may need to be adjusted when using larger
worlds (this can be done by manually specifying a value for max_episode_steps when
creating the environment with posggym.make).
Arguments
world- the world layout to use. This can either be a string specifying one of the supported worlds, or a custom :class:PPWorldobject (default ="10x10").num_predators- the number of predator (and thus controlled agents) (default =2).num_prey- the number of prey (default =3)cooperative- whether agents share all rewards or only get rewards for prey they are involved in capturing (default = ‘True`)prey_strength- how many predators are required to capture each prey, minimum is1and maximum ismin(4, num_predators). IfNonethis is set tomin(4, num_predators)(default = ‘None`)obs_dist- the local observation distance, specifying how far away in each direction each predator and prey agent observes (default =4).n_sensors- the number of lines eminating from the agent. The agent will observe atnequidistance intervals over[0, 2*pi](default =16).
Available variants
The PredatorPrey environment comes with a number of pre-built world layouts which
can be passed as an argument to posggym.make, to create different worlds. All
layouts support 2 to 8 agents.
World name |
World size |
|---|---|
|
5x5 |
|
5x5 |
|
10x10 |
|
10x10 |
|
15x15 |
|
15x15 |
|
20x20 |
|
20x20 |
For example to use the Predator Prey environment with the 15x15Blocks world, 4
predators, 4 prey, and episode step limit of 100, and the default values for the
other parameters (cooperative, obs_dist, prey_strength) you would use:
import posggym
env = posggym.make(
'PredatorPreyContinuous-v0',
max_episode_steps=100,
world="15x15Blocks",
num_predators=4,
num_prey=4
)
Version History
v0: Initial version
Reference
Ming Tan. 1993. Multi-Agent Reinforcement Learning: Independent vs. Cooperative Agents. In Proceedings of the Tenth International Conference on Machine Learning. 330–337.
J. Z. Leibo, V. F. Zambaldi, M. Lanctot, J. Marecki, and T. Graepel. 2017. Multi-Agent Reinforcement Learning in Sequential Social Dilemmas. In AAMAS, Vol. 16. ACM, 464–473
Lowe, Ryan, Yi I. Wu, Aviv Tamar, Jean Harb, OpenAI Pieter Abbeel, and Igor Mordatch. 2017. “Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments.” Advances in Neural Information Processing Systems 30.