Env

posggym.Env

class posggym.Env(*args, **kwds)

The main POSGGym class for implementing POSG environments.

The class encapsulates an environment and a POSG model. The environment maintains an internal state and can be interacted with by multiple agents in parallel through the step() and reset() functions. The POSG model can be accessed via the model attribute and exposes the model of the environment which can be used for planning or for anything else (see posggym.POSGModel class for details).

The implementation is heavily inspired by the Farama Foundation Gymnasium (https://github.com/Farama-Foundation/Gymnasium) and PettingZoo (https://github.com/Farama-Foundation/PettingZoo) APIs. It aims to be consistent with these APIs and easily compatible with the PettingZoo API.

The main API methods that users of this class need to know are:

step()
reset()
render()
close()

And the main attributes:

model - The POSG model of the environment (posggym.POSGModel)
state - The current state of the environment
possible_agents - All agents that may appear in the environment
agents - The agents currently active in the environment
action_spaces - The action space for each agent
observation_spaces - The observation space for each agent
reward_ranges - The minimum and maximum possible rewards each agent may receive for single step in the environment. The default reward range is set to \((-\infty,+\infty)\).
is_symmetric - Whether the environment is symmetric or asymmetric
spec - An environment spec that contains the information used to initialize the environment from posggym.make()
metadata - The metadata of the environment, i.e. render modes, render fps
render_mode - The current render mode of the environment

Environments have additional methods and attributes that provide more environment information and access:

unwrapped

Methods

posggym.Env.step(self, actions: Dict[str, ActType]) → Tuple[Dict[str, ObsType], Dict[str, float], Dict[str, bool], Dict[str, bool], bool, Dict[str, Dict[str, Any]]]

Run one timestep in the environment using the agents’ actions.

When the end of an episode is reached, the user is responsible for calling reset() to reset this environments state.

Parameters:

actions (Dict[str, ActType]) – a joint action containing one action per active agent in the environment.

Returns:

observations (Dict[str, ObsType]) – the joint observation containing one observation per agent.
rewards (Dict[str, float]) – the joint rewards containing one reward per agent.
terminations (Dict[str, bool]) – whether each agent has reached a terminal state in the environment. Contains one value for each agent in the environment. It’s possible, depending on the environment, for only some of the agents to be in a terminal during a given step.
truncations (Dict[str, bool]) – whether the episode has been truncated for each agent in the environment. Contains one value for each agent in the environment. Truncation for an agent signifies that the episode was ended for that agent (e.g. due to reaching the time limit) before the agent reached a terminal state.
all_done (bool) – whether the episode is finished. Provided for convenience and to handle the case where agents may be added and removed during an episode. For environments where the active agents remains constant during each episode, this is equivalent to checking if all agents are either in a terminated or truncated state. If true, the user needs to call reset().
infos (Dict[str, Dict[str, Any]]) – contains auxiliary diagnostic information (helpful for debugging, learning, and logging) for each agent.

posggym.Env.reset(self, *, seed: int | None = None, options: Dict[str, Any] | None = None) → Tuple[Dict[str, ObsType], Dict[str, Dict]]

Resets the environment and returns an initial observations and info.

This method generates a new starting state often with some randomness. This randomness can be controlled with the seed parameter. If the environment already has a random number generator (RNG) and reset() is called with seed=None, the RNG is not reset. Note, that the RNG is handled by the environment model, rather than the environment class itself.

Therefore, reset() should (in the typical use case) be called with a seed right after initialization and then never again.

For Custom environments, the first line of reset() should be super().reset(seed=seed) which implements the seeding correctly.

Parameters:

seed (int, optional) – The seed that is used to initialize the environment’s RNG. If the seed=None is passed, the RNG will not be reset. If you pass an integer, the RNG will be reset even if it already exists. Usually, you want to pass an integer right after the environment has been initialized and then never again.
options (dict, optional) – Additional information to specify how the environment is reset (optional, depending on the specific environment)

Returns:

observations (Dict[str, ObsType]) – The joint observation containing one observation per agent in the environment.
infos (Dict[str, Dict]) – Auxiliary information for each agent. It should be analogous to the info returned by step() and can be empty.

posggym.Env.render(self) → None | np.ndarray | str | Dict[str, np.ndarray] | Dict[str, str]

Render the environment as specified by environment render_mode.

The render mode attribute render_mode is set during the initialization of the environment. While the environment’s metadata render modes (env.metadata[“render_modes”]) should contain the supported render modes.

The set of supported modes varies per environment (some environments do not support rendering at all). By convention, if render_mode is:

None (default): no render is computed.
"human": Environment is rendered to the current display or terminal usually for human consumption. Returns None.
"rgb_array": Return an np.ndarray with shape (x, y, 3) representing RGB values for an x-by-y pixel image of the entire environment, suitable for turning into a video.
"ansi": Return a string (str) or StringIO.StringIO containing a terminal-style text representation for each timestep. The text can include newlines and ANSI escape sequences (e.g. for colors).
"rgb_array_dict" and "ansi_dict": Return dict mapping agent ID to render frame (RGB or ANSI depending on render mode). Each render frame is represents the agent-centric view for the given agent. May also return a render for the entire environment (like “rgb_array” and “ansi” render modes) which should be mapped to the “env” key in the dictionary by default.

Note

Make sure that your class’s metadata "render_modes" key includes the list of supported modes.

posggym.Env.close(self)

Close environment and perform any necessary cleanup.

Should be overridden in subclasses as necessary.

Attributes

Env.model: POSGModel[StateType, ObsType, ActType]: The underlying POSG model of the environment (posggym.POSGModel)

Env.state

The current state for this environment.

This must be implemented in custom environments.

Returns:: StateType

Env.possible_agents

The list of all possible agents that may appear in the environment.

Returns:: Tuple[str, …]

Env.agents

The list of agents active in the environment for current state.

This will be possible_agents, independent of state, for any environment where the number of agents remains constant during and across episodes.

Returns:: List[str]

Env.action_spaces

A mapping from Agent ID to the space of valid actions for that agent.

Returns:: Dict[str, spaces.Space]

Env.observation_spaces

A mapping from Agent ID to the space of valid observations for that agent.

Returns:: Dict[str, spaces.Space]

Env.reward_ranges

A mapping from Agent ID to the min and max possible rewards for that agent.

Each reward tuple corresponding to the minimum and maximum possible rewards for a given agent for a single step. The default reward range for each agent is set to \((-\infty,+\infty)\).

Returns:: Dict[str, Tuple[float, float]]

Env.is_symmetric

Whether the environment is symmetric.

An environment is “symmetric” if the ID of an agent in the environment does not affect the agent in anyway (i.e. all agents have the same action and observation spaces, same reward functions, and there are no differences in initial conditions all things considered). Classic examples include Rock-Paper-Scissors, Chess, Poker. In “symmetric” environments the same “policy” should do equally well independent of the ID of the agent the policy is used for.

If an environment is not “symmetric” then it is “asymmetric”, meaning that there are differences in agent properties based on the agent’s ID. In “asymmetric” environments there is no guarantee that the same “policy” will work for different agent IDs. Examples include Pursuit-Evasion games, any environments where action and/or observation space differs by agent ID.

Returns:: bool – True if environment is symmetric, False if environment is asymmetric.

Env.spec: EnvSpec | None = None: The EnvSpec of the environment normally set during posggym.make()

Env.metadata: Dict[str, Any] = {'render_modes': []}: The metadata of the environment containing rendering modes, rendering fps, etc

Env.render_mode: str | None = None: The render mode of the environment determined at initialisation

Additional Methods

property Env.unwrapped: Env

Completely unwrap this env.

Returns:: posggym.Env – The base non-wrapped posggym.Env instance

posggym.DefaultEnv

class posggym.DefaultEnv(model: POSGModel, render_mode: str | None = None)

A default environment implementation using an environment model.

This class implements the main environment methods - reset(), step(), state - using the environment model.

Users need only initialize the class with a posggym.POSGModel instance by calling super().__init__(custom_model) in their custom environment class that inherits from this class.

The custom environment needs only (optionally) implement rendering and clean-up methods:

render()
close()