Model

posggym.Model

class posggym.POSGModel(*args, **kwds)

A Partially Observable Stochastic Game (POSG) model.

This class defines functions and attributes necessary for a generative POSG model for use in simulation-based planners (e.g. MCTS), and for reinforcement learning.

The main API methods that users of this class need to know are:

get_agents()
sample_initial_state()
sample_initial_obs()
step()
seed()

Additionally, the main API attributes are:

possible_agents - All possible agents that may appear in the environment across all states.
state_space - The space of all environment states. Noting that an explicit state space definition is not needed by many simulation-based algorithms (including RL and MCTS) and can be hard to define so implementing this property should be seen as optional. In cases where it is not implemented it should be None.
action_spaces - The action space for each agent
observations_spaces - The observation space for each agent
reward_ranges - The minimum and maximum possible rewards per step for each agent. The default reward range is set to \((-\infty,+\infty)\).
is_symmetric - Whether the environment is symmetric or asymmetric. That is whether all agents are identical irrespective of their ID (i.e. same actions, observation, and reward spaces and dynamics)
rng - the model’s internal random number generator (RNG).
spec - An environment spec that contains the information used to initialize the environment from posggym.make()

Custom models should inherit from this class and implement the get_agents(), sample_initial_state(), sample_initial_obs(), step() methods and the possible_agents, action_spaces, observations_spaces, is_symmetric, rng attributes.

Custom models may optionally provide implementations for the sample_agent_initial_state() method and state_space attribute.

Note

The POSGGym Model API models all environments as environments that are observation first, that is the environment provides an initial observation before any action is taken (rather than action first, where agents perform an action before any observation is received). observation first environments are the standard in reinforcement learning problems and also for most real world problems, and are becoming the more common model API. It’s also trivial to convert an action first model into observation first by just returning a default or dummy initial observation (e.g. the initial observation is always the first observation in the list of possible observations).

Methods

posggym.POSGModel.get_agents(self, state: StateType) → List[str]

Get list of IDs for all agents that are active in given state.

The list of active agents may change depending on state.

For any environment where the number of agents remains constant during AND across episodes. This will be possible_agents, independent of state.

Parameters:: state (StateType) – The environment state
Returns:: List[str] – List of IDs for all agents that active in given state,

posggym.POSGModel.sample_initial_state(self) → StateType

Sample an initial state.

Returns:: StateType – An initial state.

posggym.POSGModel.sample_initial_obs(self, state: StateType) → Dict[str, ObsType]

Sample initial agent observations given an initial state.

Parameters:: state (StateType) – The initial state.
Returns:: Dict[str, ObsType] – A mapping from agent ID to their initial observation.

posggym.POSGModel.step(self, state: StateType, actions: Dict[str, ActType]) → JointTimestep[StateType, ObsType]

Perform generative step.

The generative step function which given a state and agent actions, returns the next state, agent observations, agent rewards, whether the environment has terminated or truncated for each agent, whether the environment has reached a terminal state for all agents, and information from the environment about the step. See posggym.model.JointTimestep for more details on return values.

For custom environments that have win/loss or success/fail conditions, you are encouraged to include this information in the info property of the returned value. We suggest using the “outcome” key with an instance of the Outcome class for values.

Parameters:

state (StateType) – The state.
actions (Dict[str, ActType]) – a joint action containing one action per active agent in the environment.

Returns:

JointTimestep – joint timestep result of performing actions in given state, including next state, observations, rewards, terminations, truncations, all done, infos.

posggym.POSGModel.seed(self, seed: int | None = None)

Set the seed for the model random number generator.

Also handles seeding for the action, observation, and (if it exists) state spaces.

Parameters:: seed (int, optional) – The seed that is used to initialize the models’s RNG. If the seed=None is passed, the RNG will not be reset. If you pass an integer, the RNG will be reset even if it already exists. Usually, you want to pass a seed when you first initialize the model.

Attributes

POSGModel.possible_agents: Tuple[str, ...]: Tuple containing the IDs of all possible agents that can be present in the environment.

POSGModel.state_space: spaces.Space | None = None

The Space object corresponding to all valid states. If implemented, all valid states should be contained within this space.

Implementing the state_space attribute is optional as many simulation-based algorithms (including RL and MCTS) don’t require it to function and the state space can be difficult to define for some environments. In cases where it is not implemented it should be None.

POSGModel.action_spaces: Dict[str, spaces.Space]: A mapping from Agent ID to the Space object corresponding to all valid actions for that agent.

POSGModel.observation_spaces: Dict[str, spaces.Space]: A mapping from Agent ID to the Space object corresponding to all valid observations for that agent.

POSGModel.reward_ranges

A mapping from Agent ID to min and max possible rewards for that agent.

Each reward tuple corresponding to the minimum and maximum possible rewards for a given agent over an episode. The default reward range for each agent is set to \((-\infty,+\infty)\).

Returns:: Dict[str, Tuple[float, float]]

POSGModel.is_symmetric: bool

Whether the environment is symmetric or not (is asymmetric).

An environment is “symmetric” if the ID of an agent in the environment does not affect the agent in anyway (i.e. all agents have the same action and observation spaces, same reward functions, and there are no differences in initial conditions all things considered). Classic examples include Rock-Paper-Scissors, Chess and Poker. In “symmetric” environments the same “policy” should do equally well independent of the ID of the agent the policy is used for.

If an environment is not “symmetric” then it is “asymmetric”, meaning that there are differences in agent properties based on the agent’s ID. In “asymmetric” environments there is no guarantee that the same “policy” will work for different agent IDs. Examples include Pursuit-Evasion games, any environments where action and/or observation space differs by agent ID.

POSGModel.rng

Return the model’s internal random number generator (RNG).

Initializes RNG with a random seed if not yet initialized.

posggym models and environments support the use of both the python built-in random library and the numpy library, unlike gymnasium which only explicitly supports the numpy library. Support for the built-in library is included as it can be 2-3X faster than the numpy library when drawing single samples, providing a significant speed-up for many environments.

There’s also nothing stopping users from using other RNG libraries so long as they implement the model API. However, explicit support in the form of tests and type hints is only provided for the random and numpy libraries.

Returns:: random.Random | numpy.random.Generator – the model’s internal RNG.

POSGModel.spec: 'EnvSpec' | None = None: The EnvSpec of the environment normally set during posggym.make()

Additional Methods

posggym.POSGModel.sample_agent_initial_state(self, agent_id: str, obs: ObsType) → StateType

Sample an initial state for an agent given it’s initial observation.

It is optional to implement this method but can be helpful in environments that are used for planning and where there are a huge number of possible initial states.

Parameters:

agent_id (Union[int, str]) – The ID of the agent to get initial state for.
obs (ObsType) – The initial observation of the agent.

Returns:

StateType – An initial state for the agent conditioned on their initial observation.

Raises:

NotImplementedError – If this method is not implemented.

posggym.POSGFullModel

class posggym.POSGFullModel(*args, **kwds)

A fully defined Partially Observable Stochastic Game (POSG) model.

This class inherits from the POSGModel, adding implementions for all individual components of a POSG. It is designed for use by planners which utilize all model components (e.g. dynamic programming and full-width planners).

The methods that need to be implemented (in addition to those in the base POSGModel class) are:

get_initial_belief()
transition_fn()
observation_fn()
reward_fn()

abstract POSGFullModel.get_initial_belief() → Dict[StateType, float]

The initial belief distribution: \(b_{0}\).

The initial belief distribution \(b_{0}\) maps initial states to probabilities.

Returns:: Dict[StateType, float] – \(Pr(s_{0}=s)\) the initial probability of each state. If a state is not included in the initial distribution object, it should be assumed to have probability 0.

abstract POSGFullModel.transition_fn(state: StateType, actions: Dict[str, ActType], next_state: StateType) → float

Transition function \(T(s', a, s)\).

The transition function \(T(s, a, s') \rightarrow [0, 1]\) defines \(Pr(s'|s, a)\), the probability of getting next state s’ given the environment was in state s and joint action a was performed.

Parameters:

state (StateType) – the state the environment was in
actions (Dict[str, ActType]) – the joint action performed
next_state (StateType) – the state of the environment after actions were performed

Returns:

float – \(Pr(s'|s, a)\), the probability of getting next state s’ given the environment was in state s and joint action a was performed.

abstract POSGFullModel.observation_fn(obs: Dict[str, ObsType], next_state: StateType, actions: Dict[str, ActType]) → float

Observation function \(Z(o, s', a)\).

The observation function \(Z(o, s', a) \rightarrow [0, 1]\) defines \(Pr(o|s', a)\), the probability of joint observation o given the joint action a was performed and the environment ended up in state s’

Parameters:

obs (Dict[str, ObsType]) – the observation received
actions (Dict[str, ActType]) – the joint action performed
next_state (StateType) – the state of the environment after actions were performed

Returns:

float – \(Pr(o|s', a)\), the probability of joint observation o given the joint action a was performed and the environment ended up in state s’.

abstract POSGFullModel.reward_fn(state: StateType, actions: Dict[str, ActType]) → Dict[str, float]

The reward Function \(R(s, a)\).

The reward function \(R(s, a) \rightarrow \mathbf{R}^n\) where n is the number of agents, defines the reward each agent receives given joint action a was performed in state s.

Parameters:

state (StateType) – the state the environment was in
actions (Dict[str, ActType]) – the joint action performed

Returns:

Dict[str, float] – The reward each agent receives given joint action a was performed in state s.

posggym.model.JointTimestep

class posggym.model.JointTimestep

The result of a single step in the model.

Supports iteration.

A dataclass is used instead of a Namedtuple so that generic typing is seamlessly supported.

JointTimestep.state: StateType

JointTimestep.observations: Dict[str, ObsType]

JointTimestep.rewards: Dict[str, float]

JointTimestep.terminations: Dict[str, bool]

JointTimestep.truncations: Dict[str, bool]

JointTimestep.all_done: bool

JointTimestep.infos: Dict[str, Dict]

posggym.model.Outcome

class posggym.model.Outcome(value)

An enum for final episode Outcome of an agent.

This is supplied for user convenience. For environments where agents can win/lose, this class can be used to supply that information to users in a standard format via the info return value of the POSGModel.step() function.

Has the following possible values:

LOSS = -1
DRAW = 0
WIN = 1
NA = None