Predator Prey
This environment is part of the Grid World environments. Please read that page first for general information.
Possible Agents |
(‘0’, ‘1’) |
Action Spaces |
{‘0’: Discrete(5), ‘1’: Discrete(5)} |
Observation Spaces |
{‘0’: Tuple(Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4)), ‘1’: Tuple(Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4))} |
Symmetric |
True |
Import |
|
The Predator-Prey Grid World Environment.
A co-operative 2D grid world problem involving multiple predator agents working together to catch prey agent/s in the environment.
Possible Agents
From two to eight agents, with all agents always active.
State Space
Each state consists of:
tuple of the
(x, y)position of all predatorstuple of the
(x, y)position of all preystuple of whether each prey has been caught or not (
0=no,1=yes)
For the coordinate x=column, y=row, with the origin (0, 0) at the top-left
square of the grid.
Action Space
Each agent has 5 actions: DO_NOTHING=0, UP=1, DOWN=2, LEFT=3, RIGHT=4
Observation Space
Each agent observes the contents of cells in a local area aroun the agent. The size
of the local area observed is controlled by the obs_dims parameter which specifies
how many cells in each direction is observed (the default value is 2, which means
the agent observes a 5x5 area). For each observed cell the agent receives one of
of four values depending on the contents of the cell: EMPTY=0, WALL=1,
PREDATOR=2, PREY=3.
Rewards
There are two modes of play:
Fully cooperative: All predators share a reward and each agent receives a reward of
1.0 / num_preyfor each prey capture, independent of which predator agent/s were responsible for the capture.Mixed cooperative: Predators only receive a reward if they were part of the prey capture, receiving
1.0 / num_preyfor each prey capture they were apart of.
Dynamics
Actions of the predator agents are deterministic and consist of moving to the adjacent cell in each of the four cardinal directions. If two or more predators attempt to move into the same cell then no agent moves.
Prey move according to the following rules (in order of priority):
if predator is within
obs_dimcells, moves away from closest predatorif another prey is within
obs_dimcells, moves away from closest preyelse move randomly
Prey always move first and predators and prey cannot occupy the same cell. The only exception being if a prey has been caught their final coordinate is recorded in the state but predator and still alive prey will be able to move into the final coordinate of caught prey.
Prey are captured when at least prey_strength predators are in adjacent cells,
where 1 <= prey_strength <= min(4, num_predators).
Starting State
Predators start from random separate locations along the edge of the grid (either in a corner, or half-way along a side), while prey start together in the middle.
Episodes End
Episodes ends when all prey have been captured. By default a max_episode_steps
limit of 50 steps is also set. This may need to be adjusted when using larger
grids (this can be done by manually specifying a value for max_episode_steps when
creating the environment with posggym.make).
Arguments
grid- the grid layout to use. This can either be a string specifying one of the supported grids, or a custom :class:PredatorPreyGridobject (default ="10x10").num_predators- the number of predator (and thus controlled agents) (default =2).num_prey- the number of prey (default =3)cooperative- whether agents share all rewards or only get rewards for prey they are involved in capturing (default = ‘True`)prey_strength- how many predators are required to capture each prey, minimum is1and maximum ismin(4, num_predators). IfNonethis is set tomin(4, num_predators)(default = ‘None`)obs_dim- the local observation dimensions, specifying how many cells in each direction each predator and prey agent observes (default =2, resulting in the agent observing a 5x5 area)
Available variants
The PredatorPrey environment comes with a number of pre-built grid layouts which can
be passed as an argument to posggym.make, to create different grids. All layouts
support 2 to 8 agents.
Grid name |
Grid size |
|---|---|
|
5x5 |
|
5x5 |
|
10x10 |
|
10x10 |
|
15x15 |
|
15x15 |
|
20x20 |
|
20x20 |
For example to use the Predator Prey environment with the 15x15Blocks grid, 4
predators, 4 prey, and episode step limit of 100, and the default values for the
other parameters (cooperative, obs_dim, prey_strength) you would use:
import posggym
env = posggym.make(
'PredatorPrey-v0',
max_episode_steps=100,
grid="15x15Blocks",
num_predators=4,
num_prey=4
)
Version History
v0: Initial version
Reference
Ming Tan. 1993. Multi-Agent Reinforcement Learning: Independent vs. Cooperative Agents. In Proceedings of the Tenth International Conference on Machine Learning. 330–337.
J. Z. Leibo, V. F. Zambaldi, M. Lanctot, J. Marecki, and T. Graepel. 2017. Multi-Agent Reinforcement Learning in Sequential Social Dilemmas. In AAMAS, Vol. 16. ACM, 464–473