Cooperative Reaching
This environment is part of the Grid World environments. Please read that page first for general information.
Possible Agents |
(‘0’, ‘1’) |
Action Spaces |
{‘0’: Discrete(5), ‘1’: Discrete(5)} |
Observation Spaces |
{‘0’: Tuple(Tuple(Discrete(5), Discrete(5)), Tuple(Discrete(6), Discrete(6))), ‘1’: Tuple(Tuple(Discrete(5), Discrete(5)), Tuple(Discrete(6), Discrete(6)))} |
Symmetric |
True |
Import |
|
The Cooperative Reaching Grid World Environment.
A cooperative 2D grid world problem where two agents must coordinate to go to the same goal. This environment tests an agent’s ability to coordinate with another agent.
Possible Agents
The environment supports two agents, with both agents always beginning and ending each episode at the same time.
State Space
Each state is made up of the the (x, y) coordinate of each agent. For the
coordinate x=column, y=row, with the origin (0, 0) at the top-left square of
the grid.
Action Space
Each agent has 5 actions: DO_NOTHING=0, UP=1, DOWN=2, LEFT=3, RIGHT=4
Observation Space
Each agent observes their (x, y) coordinate, as well as the (x, y) coordinate
of the other agent, as long as the other agent is within the observation range.
If the other agent is outside the observation range then their observed coordinate
will be (size, size) (i.e. outside of the grid).
All together each agent’s observation is tuple of the form:
(ego coord, other coord)
Rewards
Both agents receive a reward when they simoultaneously reach the same goal cell.
The reward they receive will depend on the value of the goal cell, which is
determined by the scenario. For all other steps the agents receive a reward of
0.0.
Dynamics
Actions are deterministic and consist of moving to the adjacent cell in each of the four cardinal directions. If an agent attempts to move out of bounds of the grid then they remain in their current cell. Agents can occupy the same cell at the same time.
Starting State
Agents start from random locations in the middle of the grid.
Episodes End
Episodes end when both agents simoultaneously reach the same goal cell. By default
a max_episode_steps is also set. The default value is 50 steps, but this may
need to be adjusted when using larger grids (this can be done by manually specifying
a value for max_episode_steps when creating the environment with posggym.make).
Arguments
size- the size (width and height) of grid.num_goals- the number of goal cells in the grid.mode- the mode of the environment, which determines the layout of goals in the grid as well as their values. The available modes are: [“square”, “line”, “original”]obs_distance- the number of cells in each direction that each agent can observe. This determines how close agents need to be to each other to be able to observe each other’s location. Setting this to be2*sizewill make the environment fully observable (default =None=2*size).
Available variants
The Cooperative Reaching environment comes with a number of benchmark grid layouts
which can be passed as an argument to posggym.make using the mode argument.
original- the original grid layout from the paper. This is a grid with four goals, one in each corner of the grid. The goal values are: top-left = 1, top-right = 0.75, bottom-right = 1, bottom-left = 0.75. Note, that this mode only supports having four goals (num_goals=4).square- goals are spaced out evenly along the border of the grid, starting from the top-left corner and moving clockwise. Supports any number of goals1 <= num_goals <= (size-1)*4and all goals have the same value of1.0.line- goals are laid out in a line evenly along the middle column of the grid (or one of the middle columns if the grid has an even number of columns). Supports any number of goals1 <= num_goals <= sizeand all goals have the same value of1.0.
The following table are some standard benchmark layouts that have been used in papers or are similar to those studied in paper:
Name |
|
|
|
|---|---|---|---|
|
5 |
4 |
original |
|
10 |
4 |
original |
|
5 |
4 |
square |
|
10 |
4 |
square |
|
10 |
8 |
square |
|
5 |
3 |
line |
|
7 |
4 |
line |
|
11 |
6 |
line |
Note, “Name” here is just provided to give a label for each layout. To use one of these layouts the user must specify each argument.
For example to use the Cooperative Reaching environment with the square_5_n5
benchmark layout, you would use:
import posggym
env = posggym.make('CooperativeReaching-v0', size=5, num_goals=4, mode="square")
Version History
v0: Initial version
References
Arrasy Rahman, Elliot Fosong, Ignacio Carlucho, and Stefano V. Albrecht. 2023. Generating Teammates for Training Robust Ad Hoc Teamwork Agents via Best-Response Diversity. Transactions on Machine Learning Research.