Multi Agent Tiger

This environment is part of the Classic environments. Please read that page first for general information.


Possible Agents	(‘0’, ‘1’)
Action Spaces	{‘0’: Discrete(3), ‘1’: Discrete(3)}
Observation Spaces	{‘0’: Tuple(Discrete(2), Discrete(3)), ‘1’: Tuple(Discrete(2), Discrete(3))}
Symmetric	True
Import	`posggym.make("MultiAgentTiger-v0")`

The Multi-Agent Tiger Environment.

This is a general-sum multi-agent version of the classic Tiger problem. It involves two agents that are in a corridor facing two doors: left and right. Behind one door lies a hungry tiger and behind the other lies treasure, but the agents do not know the position of either the tiger or the treasure. At each step each agent can choose to open one of the doors, or choose to listen, in which case it receives a noisy observation of if a door was opened and also the location of the tiger. If a door is opened the treasure and tiger are randomly reset.

Possible Agents

The environment supports two agents: ‘0’ and ‘1’. Both agents are always active in the environment.

State Space

The state is defined by which door the tiger is behind. TLEFT=0 for tiger is behind the left door, and TRIGHT=1 for the tiger is behind the right door.

Action Space

Each agent can either open the left-hand door OPENLEFT=0, open the right-hand door OPENRIGHT=1, or listen for the presence of the tiger LISTEN=2.

Observation Space

Each agent observation consists of the tiger position observation and a door opening observation. The tiger position observation can be either: GROWLEFT=0 for tiger left, GROWLRIGHT=1 for tiger right. While the door opening observation can be either: CREAKLEFT=0 for left door, CREAKRIGHT=1 for right door, or SILENCE=2 for silence.

Combined each agent has 6 possible observations:

(GROWLEFT, CREAKLEFT)
(GROWLEFT, CREAKRIGHT)
(GROWLEFT, SILENCE)
(GROWRIGHT, CREAKLEFT)
(GROWRIGHT, CREAKRIGHT)
(GROWRIGHT, SILENCE)

If an agent uses the LISTEN=2 action they will perceive the correct current position of the tiger with probability observation_prob (default = 0.85), independent of if the other agent opens a door or listens. Furthermore, they will perceive the correct door opening or not with probability creak_observation_prob (default = 0.9).

If an agent opens either door they will receive an observation uniformly at random from the set of all 6 possible observations.

Rewards

Each agent receives rewards independent of the other agent.

An agent receives a reward of +10 for opening the door without a tiger behind it, -100 for opening the door with the tiger behind it, and -1 for performing the listening action.

Although the game is general-sum, and not zero-sum, agents influence each other by their effect on the state (i.e. resetting the tiger position).

Dynamics

The state is reset to TLEFT or TRIGHT with equal probability whenever either agent opens one of the doors. When both agents perform the LISTEN action - the state is unchanged.

Starting State

The initial state is uniformly distributed between: TLEFT and TRIGHT.

Episode End

By default episodes continue infinitely long. To set a step limit, specify max_episode_steps when initializing the environment with posggym.make.

Arguments

observation_prob - the probability of correctly observing the position of the tiger (default = 0.85)
creak_observation_prob - the probability of correctly observing which door was opened by the other agent (default = 0.9)

Version History

v0: Initial version

References

Gmytrasiewicz, Piotr J., and Prashant Doshi. “A Framework for Sequential Planning in Multi-Agent Settings.” Journal of Artificial Intelligence Research 24 (2005).