Cooperative Reaching

This environment is part of the Grid World environments. Please read that page first for general information.


Possible Agents	(‘0’, ‘1’)
Action Spaces	{‘0’: Discrete(5), ‘1’: Discrete(5)}
Observation Spaces	{‘0’: Tuple(Tuple(Discrete(5), Discrete(5)), Tuple(Discrete(6), Discrete(6))), ‘1’: Tuple(Tuple(Discrete(5), Discrete(5)), Tuple(Discrete(6), Discrete(6)))}
Symmetric	True
Import	`posggym.make("CooperativeReaching-v0")`

The Cooperative Reaching Grid World Environment.

A cooperative 2D grid world problem where two agents must coordinate to go to the same goal. This environment tests an agent’s ability to coordinate with another agent.

Possible Agents

The environment supports two agents, with both agents always beginning and ending each episode at the same time.

State Space

Each state is made up of the the (x, y) coordinate of each agent. For the coordinate x=column, y=row, with the origin (0, 0) at the top-left square of the grid.

Action Space

Each agent has 5 actions: DO_NOTHING=0, UP=1, DOWN=2, LEFT=3, RIGHT=4

Observation Space

Each agent observes their (x, y) coordinate, as well as the (x, y) coordinate of the other agent, as long as the other agent is within the observation range. If the other agent is outside the observation range then their observed coordinate will be (size, size) (i.e. outside of the grid).

All together each agent’s observation is tuple of the form:

(ego coord, other coord)

Rewards

Both agents receive a reward when they simoultaneously reach the same goal cell. The reward they receive will depend on the value of the goal cell, which is determined by the scenario. For all other steps the agents receive a reward of 0.0.

Dynamics

Actions are deterministic and consist of moving to the adjacent cell in each of the four cardinal directions. If an agent attempts to move out of bounds of the grid then they remain in their current cell. Agents can occupy the same cell at the same time.

Starting State

Agents start from random locations in the middle of the grid.

Episodes End

Episodes end when both agents simoultaneously reach the same goal cell. By default a max_episode_steps is also set. The default value is 50 steps, but this may need to be adjusted when using larger grids (this can be done by manually specifying a value for max_episode_steps when creating the environment with posggym.make).

Arguments

size - the size (width and height) of grid.
num_goals - the number of goal cells in the grid.
mode - the mode of the environment, which determines the layout of goals in the grid as well as their values. The available modes are: [“square”, “line”, “original”]
obs_distance - the number of cells in each direction that each agent can observe. This determines how close agents need to be to each other to be able to observe each other’s location. Setting this to be 2*size will make the environment fully observable (default = None = 2*size).

Available variants

The Cooperative Reaching environment comes with a number of benchmark grid layouts which can be passed as an argument to posggym.make using the mode argument.

original - the original grid layout from the paper. This is a grid with four goals, one in each corner of the grid. The goal values are: top-left = 1, top-right = 0.75, bottom-right = 1, bottom-left = 0.75. Note, that this mode only supports having four goals (num_goals=4).
square - goals are spaced out evenly along the border of the grid, starting from the top-left corner and moving clockwise. Supports any number of goals 1 <= num_goals <= (size-1)*4 and all goals have the same value of 1.0.
line - goals are laid out in a line evenly along the middle column of the grid (or one of the middle columns if the grid has an even number of columns). Supports any number of goals 1 <= num_goals <= size and all goals have the same value of 1.0.

The following table are some standard benchmark layouts that have been used in papers or are similar to those studied in paper:

Name	`size`	`num_goals`	`mode`
`original_5`	5	4	original
`original_10`	10	4	original
`square_5_n4`	5	4	square
`square_10_n4`	10	4	square
`square_10_n8`	10	8	square
`line_5_n3`	5	3	line
`line_7_n4`	7	4	line
`line_11_n6`	11	6	line

Note, “Name” here is just provided to give a label for each layout. To use one of these layouts the user must specify each argument.

For example to use the Cooperative Reaching environment with the square_5_n5 benchmark layout, you would use:

import posggym
env = posggym.make('CooperativeReaching-v0', size=5, num_goals=4, mode="square")

Version History

v0: Initial version

References

Arrasy Rahman, Elliot Fosong, Ignacio Carlucho, and Stefano V. Albrecht. 2023. Generating Teammates for Training Robust Ad Hoc Teamwork Agents via Best-Response Diversity. Transactions on Machine Learning Research.