Predator Prey Continuous

This environment is part of the Continuous environments. Please read that page first for general information.


Possible Agents	(‘0’, ‘1’)
Action Spaces	{‘0’: Box([-0.7853982 0. ], [0.7853982 1. ], (2,), float32), ‘1’: Box([-0.7853982 0. ], [0.7853982 1. ], (2,), float32)}
Observation Spaces	{‘0’: Box(0.0, 4.0, (48,), float32), ‘1’: Box(0.0, 4.0, (48,), float32)}
Symmetric	True
Import	`posggym.make("PredatorPreyContinuous-v0")`

The Continuous Predator-Prey Environment.

A co-operative 2D continuous world problem involving multiple predator agents working together to catch prey agent/s in the environment.

Possible Agents

Varied number

State Space

Each state consists of:

2D array containing the state of each predator
2D array containing the state of each prey
1D array containing whether each prey has been caught or not (0=no, 1=yes)

The state of each predator and prey is a 1D array containing their:

(x, y) coordinate
angle/yaw (in radians)
velocity in x and y directions
angular velocity (in radians)

Action Space

Each agent’s actions is made up of two parts. The first action component specifies the angular velocity in [-pi/10, pi/10], and the second component specifies the linear velocity in [0, 1].

Observation Space

Each agent observes a local circle around themselves as a vector. This is achieved by a series of ‘n_sensors’ lines starting at the agent which extend for a distance of ‘obs_dist’. For each line the agent observes the closest entity (wall, predator, prey) along the line. This table enumerates the observation space:

Index: [start, end)	Description	Values
0 - n_sensors	Wall distance for each sensor	[0, d]
n_sensors - (2 * n_sensors)	Predator distance for each sensor	[0, d]
(2 * n_sensors) - (3 * n_sensors)	Prey distance for each sensor	[0, d]

Where d = obs_dist.

If an entity is not observed (i.e. there is none along the sensor’s line or it isn’t the closest entity to the observing agent along the line), The distance will be obs_dist.

The sensor reading ordering is relative to the agent’s direction. I.e. the values for the first sensor at indices 0, n_sensors, 2*n_sensors correspond to the distance reading to a wall/obstacle, predator, and prey, respectively, in the direction the agent is facing.

Rewards

There are two modes of play:

Fully cooperative: All predators share a reward and each agent receives a reward of 1.0 / num_prey for each prey capture, independent of which predator agent/s were responsible for the capture.
Mixed cooperative: Predators only receive a reward if they were part of the prey capture, receiving 1.0 / num_prey per capture.

In both modes prey can only been captured when at least prey_strength predators are in adjacent cells, where 1 <= prey_strength <= num_predators.

Dynamics

Actions of the predator agents are deterministic and consist of moving based on their angle and linear velocity.

Prey move according to the following rules (in order of priority):

if a predator is within prey_obs_dist distance then moves away from closest predator
if another prey is within prey_obs_dist distance then moves away from closest prey
else move randomly

Where prey_obs_dist = 0.9 * obs_dist, meaning predators are able to see slightly further than prey. If a prey is caught it is removed from the world (i.e. it’s coords become (-1, -1)).

Starting State

Predators start from random separate locations along the edge of the world (either in a corner, or half-way along a side), while prey start at random locations around the middle of the world.

Episodes End

Episodes ends when all prey have been captured. By default a max_episode_steps limit of 100 steps is also set. This may need to be adjusted when using larger worlds (this can be done by manually specifying a value for max_episode_steps when creating the environment with posggym.make).

Arguments

world - the world layout to use. This can either be a string specifying one of the supported worlds, or a custom :class:PPWorld object (default = "10x10").
num_predators - the number of predator (and thus controlled agents) (default = 2).
num_prey - the number of prey (default = 3)
cooperative - whether agents share all rewards or only get rewards for prey they are involved in capturing (default = ‘True`)
prey_strength - how many predators are required to capture each prey, minimum is 1 and maximum is min(4, num_predators). If None this is set to min(4, num_predators) (default = ‘None`)
obs_dist - the local observation distance, specifying how far away in each direction each predator and prey agent observes (default = 4).
n_sensors - the number of lines eminating from the agent. The agent will observe at n equidistance intervals over [0, 2*pi] (default = 16).

Available variants

The PredatorPrey environment comes with a number of pre-built world layouts which can be passed as an argument to posggym.make, to create different worlds. All layouts support 2 to 8 agents.

World name	World size
`5x5`	5x5
`5x5Blocks`	5x5
`10x10`	10x10
`10x10Blocks`	10x10
`15x15`	15x15
`15x15Blocks`	15x15
`20x20`	20x20
`20x20Blocks`	20x20

For example to use the Predator Prey environment with the 15x15Blocks world, 4 predators, 4 prey, and episode step limit of 100, and the default values for the other parameters (cooperative, obs_dist, prey_strength) you would use:

import posggym
env = posggym.make(
    'PredatorPreyContinuous-v0',
    max_episode_steps=100,
    world="15x15Blocks",
    num_predators=4,
    num_prey=4
)

Version History

v0: Initial version

Reference

Ming Tan. 1993. Multi-Agent Reinforcement Learning: Independent vs. Cooperative Agents. In Proceedings of the Tenth International Conference on Machine Learning. 330–337.
J. Z. Leibo, V. F. Zambaldi, M. Lanctot, J. Marecki, and T. Graepel. 2017. Multi-Agent Reinforcement Learning in Sequential Social Dilemmas. In AAMAS, Vol. 16. ACM, 464–473
Lowe, Ryan, Yi I. Wu, Aviv Tamar, Jean Harb, OpenAI Pieter Abbeel, and Igor Mordatch. 2017. “Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments.” Advances in Neural Information Processing Systems 30.