Predator Prey

../../_images/predator_prey.gif

This environment is part of the Grid World environments. Please read that page first for general information.

Possible Agents

(‘0’, ‘1’)

Action Spaces

{‘0’: Discrete(5), ‘1’: Discrete(5)}

Observation Spaces

{‘0’: Tuple(Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4)), ‘1’: Tuple(Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4), Discrete(4))}

Symmetric

True

Import

posggym.make("PredatorPrey-v0")

The Predator-Prey Grid World Environment.

A co-operative 2D grid world problem involving multiple predator agents working together to catch prey agent/s in the environment.

Possible Agents

From two to eight agents, with all agents always active.

State Space

Each state consists of:

  1. tuple of the (x, y) position of all predators

  2. tuple of the (x, y) position of all preys

  3. tuple of whether each prey has been caught or not (0=no, 1=yes)

For the coordinate x=column, y=row, with the origin (0, 0) at the top-left square of the grid.

Action Space

Each agent has 5 actions: DO_NOTHING=0, UP=1, DOWN=2, LEFT=3, RIGHT=4

Observation Space

Each agent observes the contents of cells in a local area aroun the agent. The size of the local area observed is controlled by the obs_dims parameter which specifies how many cells in each direction is observed (the default value is 2, which means the agent observes a 5x5 area). For each observed cell the agent receives one of of four values depending on the contents of the cell: EMPTY=0, WALL=1, PREDATOR=2, PREY=3.

Rewards

There are two modes of play:

  1. Fully cooperative: All predators share a reward and each agent receives a reward of 1.0 / num_prey for each prey capture, independent of which predator agent/s were responsible for the capture.

  2. Mixed cooperative: Predators only receive a reward if they were part of the prey capture, receiving 1.0 / num_prey for each prey capture they were apart of.

Dynamics

Actions of the predator agents are deterministic and consist of moving to the adjacent cell in each of the four cardinal directions. If two or more predators attempt to move into the same cell then no agent moves.

Prey move according to the following rules (in order of priority):

  1. if predator is within obs_dim cells, moves away from closest predator

  2. if another prey is within obs_dim cells, moves away from closest prey

  3. else move randomly

Prey always move first and predators and prey cannot occupy the same cell. The only exception being if a prey has been caught their final coordinate is recorded in the state but predator and still alive prey will be able to move into the final coordinate of caught prey.

Prey are captured when at least prey_strength predators are in adjacent cells, where 1 <= prey_strength <= min(4, num_predators).

Starting State

Predators start from random separate locations along the edge of the grid (either in a corner, or half-way along a side), while prey start together in the middle.

Episodes End

Episodes ends when all prey have been captured. By default a max_episode_steps limit of 50 steps is also set. This may need to be adjusted when using larger grids (this can be done by manually specifying a value for max_episode_steps when creating the environment with posggym.make).

Arguments

  • grid - the grid layout to use. This can either be a string specifying one of the supported grids, or a custom :class:PredatorPreyGrid object (default = "10x10").

  • num_predators - the number of predator (and thus controlled agents) (default = 2).

  • num_prey - the number of prey (default = 3)

  • cooperative - whether agents share all rewards or only get rewards for prey they are involved in capturing (default = ‘True`)

  • prey_strength - how many predators are required to capture each prey, minimum is 1 and maximum is min(4, num_predators). If None this is set to min(4, num_predators) (default = ‘None`)

  • obs_dim - the local observation dimensions, specifying how many cells in each direction each predator and prey agent observes (default = 2, resulting in the agent observing a 5x5 area)

Available variants

The PredatorPrey environment comes with a number of pre-built grid layouts which can be passed as an argument to posggym.make, to create different grids. All layouts support 2 to 8 agents.

Grid name

Grid size

5x5

5x5

5x5Blocks

5x5

10x10

10x10

10x10Blocks

10x10

15x15

15x15

15x15Blocks

15x15

20x20

20x20

20x20Blocks

20x20

For example to use the Predator Prey environment with the 15x15Blocks grid, 4 predators, 4 prey, and episode step limit of 100, and the default values for the other parameters (cooperative, obs_dim, prey_strength) you would use:

import posggym
env = posggym.make(
    'PredatorPrey-v0',
    max_episode_steps=100,
    grid="15x15Blocks",
    num_predators=4,
    num_prey=4
)

Version History

  • v0: Initial version

Reference

  • Ming Tan. 1993. Multi-Agent Reinforcement Learning: Independent vs. Cooperative Agents. In Proceedings of the Tenth International Conference on Machine Learning. 330–337.

  • J. Z. Leibo, V. F. Zambaldi, M. Lanctot, J. Marecki, and T. Graepel. 2017. Multi-Agent Reinforcement Learning in Sequential Social Dilemmas. In AAMAS, Vol. 16. ACM, 464–473