Pursuit Evasion Continuous

This environment is part of the Continuous environments. Please read that page first for general information.


Possible Agents	(‘0’, ‘1’)
Action Spaces	{‘0’: Box([-0.7853982 0. ], [0.7853982 1. ], (2,), float32), ‘1’: Box([-0.7853982 0. ], [0.7853982 1. ], (2,), float32)}
Observation Spaces	{‘0’: Box(0.0, [ 5.3333335 5.3333335 5.3333335 5.3333335 5.3333335 5.3333335 5.3333335 5.3333335 5.3333335 5.3333335 5.3333335 5.3333335 5.3333335 5.3333335 5.3333335 5.3333335 5.3333335 5.3333335 5.3333335 5.3333335 5.3333335 5.3333335 5.3333335 5.3333335 5.3333335 5.3333335 5.3333335 5.3333335 5.3333335 5.3333335 5.3333335 5.3333335 1. 16. 16. 16. 16. 16. 16. ], (39,), float32), ‘1’: Box(0.0, [ 5.3333335 5.3333335 5.3333335 5.3333335 5.3333335 5.3333335 5.3333335 5.3333335 5.3333335 5.3333335 5.3333335 5.3333335 5.3333335 5.3333335 5.3333335 5.3333335 5.3333335 5.3333335 5.3333335 5.3333335 5.3333335 5.3333335 5.3333335 5.3333335 5.3333335 5.3333335 5.3333335 5.3333335 5.3333335 5.3333335 5.3333335 5.3333335 1. 16. 16. 16. 16. 16. 16. ], (39,), float32)}
Symmetric	False
Import	`posggym.make("PursuitEvasionContinuous-v0")`

The Pursuit-Evasion Continuous World Environment.

An adversarial continuous world problem involving two agents: an evader and a pursuer. The evader’s goal is to reach a goal location, on the other side of the world, while the goal of the pursuer is to spot the evader before it reaches it’s goal. The evader is considered caught if it is observed by the pursuer, or occupies the same location. The evader and pursuer have knowledge of each others starting locations, however only the evader has knowledge of it’s goal location. The pursuer only knowns that the evader’s goal location is somewhere on the opposite side of the world to the evaders start location.

This environment requires each agent to reason about the which path the other agent will take through the environment.

Possible Agents

Evader = "0"
Pursuer = "1"

State Space

Each state is made up of:

the state of the evader
the state of the pursuer
the (x, y) coordinate of the evader’s start location
the (x, y) coordinate of the pursuer’s start location
the (x, y) coordinate of the evader’s goal
the minimum distance to it’s goal along the shortest discrete path achieved by the evader in the current episode (this is needed to correctly reward the agent for making progress.)

Both the evader and pursuer state consist of their:

(x, y) coordinate
angle/yaw (in radians)
velocity in x and y directions
angular velocity (in radians)

Action Space

Each agent’s actions is made up of two parts. The first action component specifies the angular velocity in [-pi/4, pi/4], and the second component specifies the linear velocity in [0, 1].

Observation Space

Each agent observes the local world in a fov radian cone in front of themselves. This is achieved by a series of n_sensors sensor lines starting at the agent which extend up to max_obs_distance away from the agent. For each sensor line the agent observes the closest entity along the line, specifically if there is a wall or the other agent. Along with the sensor readings each agent also observes whether they hear the other agent within a circle with radius ``HEARING_DIST = 4.0around themselves, the(x, y)coordinate of the evader's start location and the(x, y)coordinate of the pursuer's start location. The evader also observes the(x, y)coordinate of their goal location, while the pursuer receives a value of(0, 0` for this feature.

This table enumerates the first part of the observation space:

Index: start	Description	Values
0	Wall distance	[0, d]
n_sensors	Other agent distance	[0, d]
2 * n_sensors	Other agent heard	[0, 1]
2 * n_sensors + 1	Evader start coordinates	[0, s]
2 * n_sensors + 3	Pursuer start coordinates	[0, s]
2 * n_sensors + 5	Evader goal coordinates	[0, s]

Where d = max_obs_distance and s = world.size

If an entity is not observed by a sensor (i.e. it’s not within max_obs_distance or is not the closest entity to the observing agent along the line), The distance reading will be max_obs_distance.

Note, the goal and start coordinate observations do not change during a single episode, but they do change between episodes.

Rewards

The environment is zero-sum with the pursuer receiving the negative of the evader reward. Additionally, rewards are by default normalized so that returns are bounded between -1 and 1 (this can be disabled by the normalize_reward parameter).

The evader receives a reward of 1 for reaching it’s goal location and a reward of -1 if it gets captured. Additionally, the evader receives a small reward of 0.01 each time it’s minimum distance achieved to it’s goal along the shortest path decreases for the current episode. This is to make it so the environment is no longer sparesely rewarded and helps with exploration and learning (it can be disabled by the use_progress_reward parameter.)

Dynamics

Actions are deterministic and will change the agents direction and velocity in the direction they are facing. The velocity range of an agent is [0, 1]; agent’s cannot move backwards (have negative velocity).

Starting State

At the start of each episode the start location of the evader is selected at random from all possible start locations. The evader’s goal location is then chosen randomly from the set of available goal locations given the evaders start location (in the default maps the goal location is always on the opposite side of the map from a start location). The pursuers start location is similarly chosen from it’s set of possible start locations.

Episode End

An episode ends when either the evader is seen or touched by the pursuer or the evader reaches it’s goal. By default a max_episode_steps limit of 200 steps is also set. This may need to be adjusted when using larger worlds (this can be done by manually specifying a value for max_episode_steps when creating the environment with posggym.make).

Arguments

world - the world layout to use. This can either be a string specifying one of the supported worlds (see SUPPORTED_WORLDS), or a custom :class:PEWorld object (default = "16x16").
max_obs_distance - the maximum distance from the agent that each agent’s field of vision extends. If None then sets this to 1/3 of world size (default = None).
fov - the field of view of the agent in radians. This will determine the angle of the cone in front of the agent which it can see. The FOV will be relative to the angle of the agent (default = pi / 3).
n_sensors - the number of sensor lines eminating from the agent within their FOV. The agent will observe at n_sensors equidistance intervals over [-fov / 2, fov / 2] (default = 16).
normalize_reward - whether to normalize both agents’ rewards to be between -1 and 1 (default = ‘True`)
use_progress_reward - whether to reward the evader agent for making progress towards it’s goal. If False the evader will only be rewarded when it reaches it’s goal, making it a sparse reward problem (default = ‘True`).

Available variants

The PursuitEvasionContinuous environment comes with a number of pre-built world layouts which can be passed as an argument to posggym.make, to create different worlds.

World name	World size
`8x8`	8x8
`16x16`	16x16
`32x32`	32x32

For example to use the PursuitEvasionContinuous environment with the 32x32 world layout, and episode step limit of 200, and the default values for the other parameters you would use:

import posggym
env = posggym.make(
    'PursuitEvasionContinuous-v0',
    max_episode_steps=200,
    world="32x32",
)

References

[This Pursuit-Evasion implementation is directly inspired by the problem] Seaman, Iris Rubi, Jan-Willem van de Meent, and David Wingate. 2018. “Nested Reasoning About Autonomous Agents Using Probabilistic Programs.” ArXiv Preprint ArXiv:1812.01569.
Schwartz, Jonathon, Ruijia Zhou, and Hanna Kurniawati. “Online Planning for Interactive-POMDPs using Nested Monte Carlo Tree Search.” In 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 8770-8777. IEEE, 2022.