Predator Prey

This environment is part of the Grid World environments. Please read that page first for general information.


Possible Agents	(‘0’, ‘1’)
Action Spaces	{‘0’: Discrete(5), ‘1’: Discrete(5)}
Observation Spaces	{‘0’: Tuple(Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4)), ‘1’: Tuple(Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4))}
Symmetric	True
Import	`posggym.make("PredatorPrey-v0")`

The Predator-Prey Grid World Environment.

A co-operative 2D grid world problem involving multiple predator agents working together to catch prey agent/s in the environment.

Possible Agents

From two to eight agents, with all agents always active.

State Space

Each state consists of:

tuple of the (x, y) position of all predators
tuple of the (x, y) position of all preys
tuple of whether each prey has been caught or not (0=no, 1=yes)

For the coordinate x=column, y=row, with the origin (0, 0) at the top-left square of the grid.

Action Space

Each agent has 5 actions: DO_NOTHING=0, UP=1, DOWN=2, LEFT=3, RIGHT=4

Observation Space

Each agent observes the contents of cells in a local area aroun the agent. The size of the local area observed is controlled by the obs_dims parameter which specifies how many cells in each direction is observed (the default value is 2, which means the agent observes a 5x5 area). For each observed cell the agent receives one of of four values depending on the contents of the cell: EMPTY=0, WALL=1, PREDATOR=2, PREY=3.

Rewards

There are two modes of play:

Fully cooperative: All predators share a reward and each agent receives a reward of 1.0 / num_prey for each prey capture, independent of which predator agent/s were responsible for the capture.
Mixed cooperative: Predators only receive a reward if they were part of the prey capture, receiving 1.0 / num_prey for each prey capture they were apart of.

Dynamics

Actions of the predator agents are deterministic and consist of moving to the adjacent cell in each of the four cardinal directions. If two or more predators attempt to move into the same cell then no agent moves.

Prey move according to the following rules (in order of priority):

if predator is within obs_dim cells, moves away from closest predator
if another prey is within obs_dim cells, moves away from closest prey
else move randomly

Prey always move first and predators and prey cannot occupy the same cell. The only exception being if a prey has been caught their final coordinate is recorded in the state but predator and still alive prey will be able to move into the final coordinate of caught prey.

Prey are captured when at least prey_strength predators are in adjacent cells, where 1 <= prey_strength <= min(4, num_predators).

Starting State

Predators start from random separate locations along the edge of the grid (either in a corner, or half-way along a side), while prey start together in the middle.

Episodes End

Episodes ends when all prey have been captured. By default a max_episode_steps limit of 50 steps is also set. This may need to be adjusted when using larger grids (this can be done by manually specifying a value for max_episode_steps when creating the environment with posggym.make).

Arguments

grid - the grid layout to use. This can either be a string specifying one of the supported grids, or a custom :class:PredatorPreyGrid object (default = "10x10").
num_predators - the number of predator (and thus controlled agents) (default = 2).
num_prey - the number of prey (default = 3)
cooperative - whether agents share all rewards or only get rewards for prey they are involved in capturing (default = ‘True`)
prey_strength - how many predators are required to capture each prey, minimum is 1 and maximum is min(4, num_predators). If None this is set to min(4, num_predators) (default = ‘None`)
obs_dim - the local observation dimensions, specifying how many cells in each direction each predator and prey agent observes (default = 2, resulting in the agent observing a 5x5 area)

Available variants

The PredatorPrey environment comes with a number of pre-built grid layouts which can be passed as an argument to posggym.make, to create different grids. All layouts support 2 to 8 agents.

Grid name	Grid size
`5x5`	5x5
`5x5Blocks`	5x5
`10x10`	10x10
`10x10Blocks`	10x10
`15x15`	15x15
`15x15Blocks`	15x15
`20x20`	20x20
`20x20Blocks`	20x20

For example to use the Predator Prey environment with the 15x15Blocks grid, 4 predators, 4 prey, and episode step limit of 100, and the default values for the other parameters (cooperative, obs_dim, prey_strength) you would use:

import posggym
env = posggym.make(
    'PredatorPrey-v0',
    max_episode_steps=100,
    grid="15x15Blocks",
    num_predators=4,
    num_prey=4
)

Version History

v0: Initial version

Reference

Ming Tan. 1993. Multi-Agent Reinforcement Learning: Independent vs. Cooperative Agents. In Proceedings of the Tenth International Conference on Machine Learning. 330–337.
J. Z. Leibo, V. F. Zambaldi, M. Lanctot, J. Marecki, and T. Graepel. 2017. Multi-Agent Reinforcement Learning in Sequential Social Dilemmas. In AAMAS, Vol. 16. ACM, 464–473