U A V

This environment is part of the Grid World environments. Please read that page first for general information.


Possible Agents	(‘0’, ‘1’)
Action Spaces	{‘0’: Discrete(4), ‘1’: Discrete(4)}
Observation Spaces	{‘0’: Tuple(Tuple(Discrete(5), Discrete(5)), Tuple(Discrete(5), Discrete(5))), ‘1’: Discrete(4)}
Symmetric	False
Import	`posggym.make("UAV-v0")`

The Unmanned Aerial Vehicle Grid World Environment.

An adversarial 2D grid world problem involving two agents, a Unmanned Aerial Vehicle (UAV) and a fugitive. The UAV’s goal is to capture the fugitive, while the fugitive’s goal is to reach the safe house located at a known fixed location on the grid. The fugitive is considered caught if it is co-located with the UAV. The UAV observes it’s own location and receives a noisy observation of the fugitive’s location. The fugitive does not know it’s location but it receives a noisy observation of its relative direction to the safe house when it is adjacent to the safe house.

Posgible Agents

UAV = ‘0’
Fugitive = ‘1’

State Space

Each state contains the (x, y) (x=column, y=row, with origin at the top-left square of the grid) of the UAV and fugitive agents. Specifically, a states is ((x_uav, y_uav), (x_fugitive, y_fugitive)).

Action Space

Each agent has 4 actions corresponding to moving in the 4 cardinal directions: NORTH=0, EAST=1, SOUTH=2, WEST=3.

Observation Space

The UAV observes it’s (x, y) coordinates and receives a noisy observation of the fugitives (x, y) coordinates. The UAV observes the correct fugitive coordinates with probability p=0.9, and one of the adjacent locations to the true fugitive location with p=1-0.9.

The fugitive can sense it’s position with respect to the safe house, namely whether it is north of it (OBSNORTH=0), south of it (OBSSOUTH=1), or at the same level (OBSLEVEL=3). These observations are received with accuracy of 0.8, and only when the fugitive is adjacent to it. If the fugitive is not adjacent to the safe house it receives no observation (OBSNONE=4).

Rewards

Both agents receive a penalty of -0.04 for each step. If the fugitive reaches the safe house then the fugitive receives a reward of 1, while the UAV receives a penalty of -1. If the fugitive is caught by the UAV, then the fugitive receives a penalty of -1, while the UAV receives a reward of 1.

Dynamics

Actions are deterministic, so agents move into the next cell in the actions direction if it is not out-of-bounds. The fugitive’s position is reset at random if it reaches the safe house or gets caught by the UAV.

Starting State

Initially, the location of both agents is chosen at random, while the safe house is always located in the same location depending on grid layout.

Episode End

Since the fugitive is reset each time it is caught. Episodes only end when the step limit is reached, as specified by max_episode_steps when initializing the environment with posggym.make (default=50).

Arguments

grid - the grid of the environment. This can be an integer specifying the width and height of the grid, in which case an empty grid with the given dimensions and default position for the safe house will be used. Alternatively, it can be a UAVGrid instance. (default = 5).

Problem Sizes

For reference, the table below contains the size of state, and observation spaces for different grid sizes.

Grid size	States	Observations UAV	Observations Fugitive
`3x3`	81	81	4
`4x4`	256	256	4
`5x5`	625	625	4
`6x6`	1296	1296	4

Version History

v0: Initial version

Reference

Panella, Alessandro, and Piotr Gmytrasiewicz. 2017. “Interactive POMDPs with Finite-State Models of Other Agents.” Autonomous Agents and Multi-Agent Systems 31 (4): 861–904.