Predator Prey Continuous

../../_images/predator_prey_continuous.gif

This environment is part of the Continuous environments. Please read that page first for general information.

Possible Agents

(‘0’, ‘1’)

Action Spaces

{‘0’: Box([-0.7853982 0. ], [0.7853982 1. ], (2,), float32), ‘1’: Box([-0.7853982 0. ], [0.7853982 1. ], (2,), float32)}

Observation Spaces

{‘0’: Box(0.0, 4.0, (48,), float32), ‘1’: Box(0.0, 4.0, (48,), float32)}

Symmetric

True

Import

posggym.make("PredatorPreyContinuous-v0")

The Continuous Predator-Prey Environment.

A co-operative 2D continuous world problem involving multiple predator agents working together to catch prey agent/s in the environment.

Possible Agents

Varied number

State Space

Each state consists of:

  1. 2D array containing the state of each predator

  2. 2D array containing the state of each prey

  3. 1D array containing whether each prey has been caught or not (0=no, 1=yes)

The state of each predator and prey is a 1D array containing their:

  • (x, y) coordinate

  • angle/yaw (in radians)

  • velocity in x and y directions

  • angular velocity (in radians)

Action Space

Each agent’s actions is made up of two parts. The first action component specifies the angular velocity in [-pi/10, pi/10], and the second component specifies the linear velocity in [0, 1].

Observation Space

Each agent observes a local circle around themselves as a vector. This is achieved by a series of ‘n_sensors’ lines starting at the agent which extend for a distance of ‘obs_dist’. For each line the agent observes the closest entity (wall, predator, prey) along the line. This table enumerates the observation space:

Index: [start, end)

Description

Values

0 - n_sensors

Wall distance for each sensor

[0, d]

n_sensors - (2 * n_sensors)

Predator distance for each sensor

[0, d]

(2 * n_sensors) - (3 * n_sensors)

Prey distance for each sensor

[0, d]

Where d = obs_dist.

If an entity is not observed (i.e. there is none along the sensor’s line or it isn’t the closest entity to the observing agent along the line), The distance will be obs_dist.

The sensor reading ordering is relative to the agent’s direction. I.e. the values for the first sensor at indices 0, n_sensors, 2*n_sensors correspond to the distance reading to a wall/obstacle, predator, and prey, respectively, in the direction the agent is facing.

Rewards

There are two modes of play:

  1. Fully cooperative: All predators share a reward and each agent receives a reward of 1.0 / num_prey for each prey capture, independent of which predator agent/s were responsible for the capture.

  2. Mixed cooperative: Predators only receive a reward if they were part of the prey capture, receiving 1.0 / num_prey per capture.

In both modes prey can only been captured when at least prey_strength predators are in adjacent cells, where 1 <= prey_strength <= num_predators.

Dynamics

Actions of the predator agents are deterministic and consist of moving based on their angle and linear velocity.

Prey move according to the following rules (in order of priority):

  1. if a predator is within prey_obs_dist distance then moves away from closest predator

  2. if another prey is within prey_obs_dist distance then moves away from closest prey

  3. else move randomly

Where prey_obs_dist = 0.9 * obs_dist, meaning predators are able to see slightly further than prey. If a prey is caught it is removed from the world (i.e. it’s coords become (-1, -1)).

Starting State

Predators start from random separate locations along the edge of the world (either in a corner, or half-way along a side), while prey start at random locations around the middle of the world.

Episodes End

Episodes ends when all prey have been captured. By default a max_episode_steps limit of 100 steps is also set. This may need to be adjusted when using larger worlds (this can be done by manually specifying a value for max_episode_steps when creating the environment with posggym.make).

Arguments

  • world - the world layout to use. This can either be a string specifying one of the supported worlds, or a custom :class:PPWorld object (default = "10x10").

  • num_predators - the number of predator (and thus controlled agents) (default = 2).

  • num_prey - the number of prey (default = 3)

  • cooperative - whether agents share all rewards or only get rewards for prey they are involved in capturing (default = ‘True`)

  • prey_strength - how many predators are required to capture each prey, minimum is 1 and maximum is min(4, num_predators). If None this is set to min(4, num_predators) (default = ‘None`)

  • obs_dist - the local observation distance, specifying how far away in each direction each predator and prey agent observes (default = 4).

  • n_sensors - the number of lines eminating from the agent. The agent will observe at n equidistance intervals over [0, 2*pi] (default = 16).

Available variants

The PredatorPrey environment comes with a number of pre-built world layouts which can be passed as an argument to posggym.make, to create different worlds. All layouts support 2 to 8 agents.

World name

World size

5x5

5x5

5x5Blocks

5x5

10x10

10x10

10x10Blocks

10x10

15x15

15x15

15x15Blocks

15x15

20x20

20x20

20x20Blocks

20x20

For example to use the Predator Prey environment with the 15x15Blocks world, 4 predators, 4 prey, and episode step limit of 100, and the default values for the other parameters (cooperative, obs_dist, prey_strength) you would use:

import posggym
env = posggym.make(
    'PredatorPreyContinuous-v0',
    max_episode_steps=100,
    world="15x15Blocks",
    num_predators=4,
    num_prey=4
)

Version History

  • v0: Initial version

Reference

  • Ming Tan. 1993. Multi-Agent Reinforcement Learning: Independent vs. Cooperative Agents. In Proceedings of the Tenth International Conference on Machine Learning. 330–337.

  • J. Z. Leibo, V. F. Zambaldi, M. Lanctot, J. Marecki, and T. Graepel. 2017. Multi-Agent Reinforcement Learning in Sequential Social Dilemmas. In AAMAS, Vol. 16. ACM, 464–473

  • Lowe, Ryan, Yi I. Wu, Aviv Tamar, Jean Harb, OpenAI Pieter Abbeel, and Igor Mordatch. 2017. “Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments.” Advances in Neural Information Processing Systems 30.