U A V
This environment is part of the Grid World environments. Please read that page first for general information.
Possible Agents |
(‘0’, ‘1’) |
Action Spaces |
{‘0’: Discrete(4), ‘1’: Discrete(4)} |
Observation Spaces |
{‘0’: Tuple(Tuple(Discrete(5), Discrete(5)), Tuple(Discrete(5), Discrete(5))), ‘1’: Discrete(4)} |
Symmetric |
False |
Import |
|
The Unmanned Aerial Vehicle Grid World Environment.
An adversarial 2D grid world problem involving two agents, a Unmanned Aerial Vehicle (UAV) and a fugitive. The UAV’s goal is to capture the fugitive, while the fugitive’s goal is to reach the safe house located at a known fixed location on the grid. The fugitive is considered caught if it is co-located with the UAV. The UAV observes it’s own location and receives a noisy observation of the fugitive’s location. The fugitive does not know it’s location but it receives a noisy observation of its relative direction to the safe house when it is adjacent to the safe house.
Posgible Agents
UAV = ‘0’
Fugitive = ‘1’
State Space
Each state contains the (x, y) (x=column, y=row, with origin at the
top-left square of the grid) of the UAV and fugitive agents. Specifically,
a states is ((x_uav, y_uav), (x_fugitive, y_fugitive)).
Action Space
Each agent has 4 actions corresponding to moving in the 4 cardinal
directions: NORTH=0, EAST=1, SOUTH=2, WEST=3.
Observation Space
The UAV observes it’s (x, y) coordinates and receives a noisy observation
of the fugitives (x, y) coordinates. The UAV observes the correct fugitive
coordinates with probability p=0.9, and one of the adjacent locations to the true
fugitive location with p=1-0.9.
The fugitive can sense it’s position with respect to the safe house, namely
whether it is north of it (OBSNORTH=0), south of it (OBSSOUTH=1), or at the
same level (OBSLEVEL=3). These observations are received with accuracy of 0.8,
and only when the fugitive is adjacent to it. If the fugitive is not
adjacent to the safe house it receives no observation (OBSNONE=4).
Rewards
Both agents receive a penalty of -0.04 for each step.
If the fugitive reaches the safe house then the fugitive receives a reward of 1,
while the UAV receives a penalty of -1.
If the fugitive is caught by the UAV, then the fugitive receives a penalty of -1,
while the UAV receives a reward of 1.
Dynamics
Actions are deterministic, so agents move into the next cell in the actions direction if it is not out-of-bounds. The fugitive’s position is reset at random if it reaches the safe house or gets caught by the UAV.
Starting State
Initially, the location of both agents is chosen at random, while the safe house is always located in the same location depending on grid layout.
Episode End
Since the fugitive is reset each time it is caught. Episodes only end when the step
limit is reached, as specified by max_episode_steps when initializing the
environment with posggym.make (default=50).
Arguments
grid- the grid of the environment. This can be an integer specifying the width and height of the grid, in which case an empty grid with the given dimensions and default position for the safe house will be used. Alternatively, it can be aUAVGridinstance. (default =5).
Problem Sizes
For reference, the table below contains the size of state, and observation spaces for different grid sizes.
Grid size |
States |
Observations UAV |
Observations Fugitive |
|---|---|---|---|
|
81 |
81 |
4 |
|
256 |
256 |
4 |
|
625 |
625 |
4 |
|
1296 |
1296 |
4 |
Version History
v0: Initial version
Reference
Panella, Alessandro, and Piotr Gmytrasiewicz. 2017. “Interactive POMDPs with Finite-State Models of Other Agents.” Autonomous Agents and Multi-Agent Systems 31 (4): 861–904.