Env
posggym.Env
- class posggym.Env(*args, **kwds)
The main POSGGym class for implementing POSG environments.
The class encapsulates an environment and a POSG model. The environment maintains an internal state and can be interacted with by multiple agents in parallel through the
step()andreset()functions. The POSG model can be accessed via themodelattribute and exposes the model of the environment which can be used for planning or for anything else (seeposggym.POSGModelclass for details).The implementation is heavily inspired by the Farama Foundation Gymnasium (https://github.com/Farama-Foundation/Gymnasium) and PettingZoo (https://github.com/Farama-Foundation/PettingZoo) APIs. It aims to be consistent with these APIs and easily compatible with the PettingZoo API.
The main API methods that users of this class need to know are:
And the main attributes:
model- The POSG model of the environment (posggym.POSGModel)state- The current state of the environmentpossible_agents- All agents that may appear in the environmentagents- The agents currently active in the environmentaction_spaces- The action space for each agentobservation_spaces- The observation space for each agentreward_ranges- The minimum and maximum possible rewards each agent may receive for single step in the environment. The default reward range is set to \((-\infty,+\infty)\).is_symmetric- Whether the environment is symmetric or asymmetricspec- An environment spec that contains the information used to initialize the environment fromposggym.make()metadata- The metadata of the environment, i.e. render modes, render fpsrender_mode- The current render mode of the environment
Environments have additional methods and attributes that provide more environment information and access:
Methods
- posggym.Env.step(self, actions: Dict[str, ActType]) Tuple[Dict[str, ObsType], Dict[str, float], Dict[str, bool], Dict[str, bool], bool, Dict[str, Dict[str, Any]]]
Run one timestep in the environment using the agents’ actions.
When the end of an episode is reached, the user is responsible for calling
reset()to reset this environments state.- Parameters:
actions (Dict[str, ActType]) – a joint action containing one action per active agent in the environment.
- Returns:
observations (Dict[str, ObsType]) – the joint observation containing one observation per agent.
rewards (Dict[str, float]) – the joint rewards containing one reward per agent.
terminations (Dict[str, bool]) – whether each agent has reached a terminal state in the environment. Contains one value for each agent in the environment. It’s possible, depending on the environment, for only some of the agents to be in a terminal during a given step.
truncations (Dict[str, bool]) – whether the episode has been truncated for each agent in the environment. Contains one value for each agent in the environment. Truncation for an agent signifies that the episode was ended for that agent (e.g. due to reaching the time limit) before the agent reached a terminal state.
all_done (bool) – whether the episode is finished. Provided for convenience and to handle the case where agents may be added and removed during an episode. For environments where the active agents remains constant during each episode, this is equivalent to checking if all agents are either in a terminated or truncated state. If true, the user needs to call
reset().infos (Dict[str, Dict[str, Any]]) – contains auxiliary diagnostic information (helpful for debugging, learning, and logging) for each agent.
- posggym.Env.reset(self, *, seed: int | None = None, options: Dict[str, Any] | None = None) Tuple[Dict[str, ObsType], Dict[str, Dict]]
Resets the environment and returns an initial observations and info.
This method generates a new starting state often with some randomness. This randomness can be controlled with the
seedparameter. If the environment already has a random number generator (RNG) andreset()is called withseed=None, the RNG is not reset. Note, that the RNG is handled by the environmentmodel, rather than the environment class itself.Therefore,
reset()should (in the typical use case) be called with a seed right after initialization and then never again.For Custom environments, the first line of
reset()should besuper().reset(seed=seed)which implements the seeding correctly.- Parameters:
seed (int, optional) – The seed that is used to initialize the environment’s RNG. If the
seed=Noneis passed, the RNG will not be reset. If you pass an integer, the RNG will be reset even if it already exists. Usually, you want to pass an integer right after the environment has been initialized and then never again.options (dict, optional) – Additional information to specify how the environment is reset (optional, depending on the specific environment)
- Returns:
observations (Dict[str, ObsType]) – The joint observation containing one observation per agent in the environment.
infos (Dict[str, Dict]) – Auxiliary information for each agent. It should be analogous to the
inforeturned bystep()and can be empty.
- posggym.Env.render(self) None | np.ndarray | str | Dict[str, np.ndarray] | Dict[str, str]
Render the environment as specified by environment
render_mode.The render mode attribute
render_modeis set during the initialization of the environment. While the environment’smetadatarender modes (env.metadata[“render_modes”]) should contain the supported render modes.The set of supported modes varies per environment (some environments do not support rendering at all). By convention, if
render_modeis:None(default): no render is computed."human": Environment is rendered to the current display or terminal usually for human consumption. ReturnsNone."rgb_array": Return annp.ndarraywith shape(x, y, 3)representing RGB values for an x-by-y pixel image of the entire environment, suitable for turning into a video."ansi": Return a string (str) orStringIO.StringIOcontaining a terminal-style text representation for each timestep. The text can include newlines and ANSI escape sequences (e.g. for colors)."rgb_array_dict"and"ansi_dict": Returndictmapping agent ID to render frame (RGB or ANSI depending on render mode). Each render frame is represents the agent-centric view for the given agent. May also return a render for the entire environment (like “rgb_array” and “ansi” render modes) which should be mapped to the “env” key in the dictionary by default.
Note
Make sure that your class’s
metadata"render_modes"key includes the list of supported modes.
- posggym.Env.close(self)
Close environment and perform any necessary cleanup.
Should be overridden in subclasses as necessary.
Attributes
- Env.model: POSGModel[StateType, ObsType, ActType]
The underlying POSG model of the environment (
posggym.POSGModel)
- Env.state
The current state for this environment.
This must be implemented in custom environments.
- Returns:
StateType
- Env.possible_agents
The list of all possible agents that may appear in the environment.
- Returns:
Tuple[str, …]
- Env.agents
The list of agents active in the environment for current state.
This will be
possible_agents, independent of state, for any environment where the number of agents remains constant during and across episodes.- Returns:
List[str]
- Env.action_spaces
A mapping from Agent ID to the space of valid actions for that agent.
- Returns:
Dict[str, spaces.Space]
- Env.observation_spaces
A mapping from Agent ID to the space of valid observations for that agent.
- Returns:
Dict[str, spaces.Space]
- Env.reward_ranges
A mapping from Agent ID to the min and max possible rewards for that agent.
Each reward tuple corresponding to the minimum and maximum possible rewards for a given agent for a single step. The default reward range for each agent is set to \((-\infty,+\infty)\).
- Returns:
Dict[str, Tuple[float, float]]
- Env.is_symmetric
Whether the environment is symmetric.
An environment is “symmetric” if the ID of an agent in the environment does not affect the agent in anyway (i.e. all agents have the same action and observation spaces, same reward functions, and there are no differences in initial conditions all things considered). Classic examples include Rock-Paper-Scissors, Chess, Poker. In “symmetric” environments the same “policy” should do equally well independent of the ID of the agent the policy is used for.
If an environment is not “symmetric” then it is “asymmetric”, meaning that there are differences in agent properties based on the agent’s ID. In “asymmetric” environments there is no guarantee that the same “policy” will work for different agent IDs. Examples include Pursuit-Evasion games, any environments where action and/or observation space differs by agent ID.
- Returns:
bool –
Trueif environment is symmetric,Falseif environment is asymmetric.
- Env.spec: EnvSpec | None = None
The
EnvSpecof the environment normally set duringposggym.make()
- Env.metadata: Dict[str, Any] = {'render_modes': []}
The metadata of the environment containing rendering modes, rendering fps, etc
- Env.render_mode: str | None = None
The render mode of the environment determined at initialisation
Additional Methods
posggym.DefaultEnv
- class posggym.DefaultEnv(model: POSGModel, render_mode: str | None = None)
A default environment implementation using an environment model.
This class implements the main environment methods -
reset(),step(),state- using the environment model.Users need only initialize the class with a
posggym.POSGModelinstance by callingsuper().__init__(custom_model)in their custom environment class that inherits from this class.The custom environment needs only (optionally) implement rendering and clean-up methods:
render()close()