memory_tools_marl¶
- class xuance.common.memory_tools_marl.BaseBuffer(*args)[源代码]¶
基类:
ABCBasic buffer for MARL algorithms.
- property full¶
- class xuance.common.memory_tools_marl.IC3Net_OnPolicyBuffer_RNN(agent_keys: List[str], state_space: Dict[str, gymnasium.spaces.Space] | None = None, obs_space: Dict[str, Dict[str, gymnasium.spaces.Space]] | None = None, act_space: Dict[str, Dict[str, gymnasium.spaces.Space]] | None = None, n_envs: int = 1, buffer_size: int = 1, max_episode_steps: int = 1, use_gae: bool | None = False, use_advnorm: bool | None = False, gamma: float | None = None, gae_lam: float | None = None, **kwargs)[源代码]¶
- class xuance.common.memory_tools_marl.MARL_OffPolicyBuffer(agent_keys: List[str], state_space: Dict[str, gymnasium.spaces.Space] | None = None, obs_space: Dict[str, Dict[str, gymnasium.spaces.Space]] | None = None, act_space: Dict[str, Dict[str, gymnasium.spaces.Space]] | None = None, n_envs: int = 1, buffer_size: int = 1, batch_size: int = 1, **kwargs)[源代码]¶
基类:
BaseBufferReplay buffer for off-policy MARL algorithms.
- 参数:
agent_keys (List[str]) – Keys that identify each agent.
state_space (Dict[str, Space]) – Global state space, type: Discrete, Box.
obs_space (Dict[str, Dict[str, Space]]) – Observation space for one agent (suppose same obs space for group agents).
act_space (Dict[str, Dict[str, Space]]) – Action space for one agent (suppose same actions space for group agents).
n_envs (int) – Number of parallel environments.
buffer_size (int) – Buffer size of total experience data.
batch_size (int) – Batch size of transition data for a sample.
**kwargs – Other arguments.
示例
>>> state_space=None >>> obs_space={'agent_0': Box(-inf, inf, (18,), float32), ... 'agent_1': Box(-inf, inf, (18,), float32), ... 'agent_2': Box(-inf, inf, (18,), float32)}, >>> act_space={'agent_0': Box(0.0, 1.0, (5,), float32), ... 'agent_1': Box(0.0, 1.0, (5,), float32), ... 'agent_2': Box(0.0, 1.0, (5,), float32)}, >>> n_envs=50, >>> buffer_size=10000, >>> batch_size=256, >>> agent_keys=['agent_0', 'agent_1', 'agent_2'], >>> memory = MARL_OffPolicyBuffer(agent_keys=agent_keys, state_space=state_space, obs_space=obs_space, ... act_space=act_space, n_envs=n_envs, buffer_size=buffer_size, ... batch_size=batch_size)
- clear()[源代码]¶
Clears the memory data in the replay buffer.
示例
An example shows the data shape:
# (n_env=50, buffer_size=10000, agent_keys=['agent_0', 'agent_1', 'agent_2']) self.data = { 'obs': { 'agent_0': shape=[50, 200, 18], 'agent_1': shape=[50, 200, 18], 'agent_2': shape=[50, 200, 18], }, # dim_obs: 18 'actions': { 'agent_0': shape=[50, 200, 5], 'agent_1': shape=[50, 200, 5], 'agent_2': shape=[50, 200, 5], }, # dim_act: 5 ... }
- class xuance.common.memory_tools_marl.MARL_OffPolicyBuffer_RNN(agent_keys: List[str], state_space: Dict[str, gymnasium.spaces.Space] | None = None, obs_space: Dict[str, Dict[str, gymnasium.spaces.Space]] | None = None, act_space: Dict[str, Dict[str, gymnasium.spaces.Space]] | None = None, n_envs: int = 1, buffer_size: int = 1, batch_size: int = 1, max_episode_steps: int = 1, **kwargs)[源代码]¶
-
Replay buffer for off-policy MARL algorithms with DRQN trick.
- 参数:
agent_keys (List[str]) – Keys that identify each agent.
state_space (Dict[str, Space]) – Global state space, type: Discrete, Box.
obs_space (Dict[str, Dict[str, Space]]) – Observation space for one agent (suppose same obs space for group agents).
act_space (Dict[str, Dict[str, Space]]) – Action space for one agent (suppose same actions space for group agents).
n_envs (int) – Number of parallel environments.
buffer_size (int) – Buffer size of total experience data.
batch_size (int) – Batch size of episodes for a sample.
max_episode_steps (int) – The sequence length of each episode data.
**kwargs – Other arguments.
示例
>>> state_space=None >>> obs_space={'agent_0': Box(-inf, inf, (18,), float32), ... 'agent_1': Box(-inf, inf, (18,), float32), ... 'agent_2': Box(-inf, inf, (18,), float32)}, >>> act_space={'agent_0': Box(0.0, 1.0, (5,), float32), ... 'agent_1': Box(0.0, 1.0, (5,), float32), ... 'agent_2': Box(0.0, 1.0, (5,), float32)}, >>> n_envs=50, >>> buffer_size=10000, >>> batch_size=256, >>> agent_keys=['agent_0', 'agent_1', 'agent_2'], >>> max_episode_steps=60 >>> memory = MARL_OffPolicyBuffer_RNN(agent_keys=agent_keys, state_space=state_space, ... obs_space=obs_space, act_space=act_space, ... n_envs=n_envs, buffer_size=buffer_size, batch_size=batch_size, ... max_episode_steps=max_episode_steps)
- clear()[源代码]¶
Clears the memory data in the replay buffer.
示例
An example shows the data shape (
buffer_size=10000,max_eps_len=60,agent_keys=['agent_0', 'agent_1', 'agent_2']):self.data = { 'obs': { 'agent_0': shape=[10000, 61, 18], 'agent_1': shape=[10000, 61, 18], 'agent_2': shape=[10000, 61, 18], }, # dim_obs: 18 'actions': { 'agent_0': shape=[10000, 60, 5], 'agent_1': shape=[10000, 60, 5], 'agent_2': shape=[10000, 60, 5], }, # dim_act: 5 ... 'filled': shape=[10000, 60], # Step mask values. True means current step is not terminated. }
- clear_episodes()[源代码]¶
Clears an episode of data for multiple environments in the replay buffer.
示例
An example shows the data shape (
n_envs=16,max_eps_len=60,agent_keys=['agent_0', 'agent_1', 'agent_2']):self.data = { 'obs': { 'agent_0': shape=[16, 61, 18], 'agent_1': shape=[16, 61, 18], 'agent_2': shape=[16, 61, 18], }, # dim_obs: 18 'actions': { 'agent_0': shape=[16, 60, 5], 'agent_1': shape=[16, 60, 5], 'agent_2': shape=[16, 60, 5], }, # dim_act: 5 ... 'filled': shape=[16, 60], # Step mask values. True means current step is not terminated. }
- finish_path(i_env, **terminal_data)[源代码]¶
Address the terminal states, including store the terminal observations, avail_actions, and others.
- 参数:
i_env (int) – The i-th environment.
terminal_data (dict) – The terminal states.
- sample(batch_size=None)[源代码]¶
Samples a batch of data for model training.
- 参数:
batch_size (int) – The size of the data batch, default is self.batch_size (recommended).
- 返回:
A dict of sampled data.
- 返回类型:
samples_dict (dict)
- class xuance.common.memory_tools_marl.MARL_OnPolicyBuffer(agent_keys: List[str], state_space: Dict[str, gymnasium.spaces.Space] | None = None, obs_space: Dict[str, Dict[str, gymnasium.spaces.Space]] | None = None, act_space: Dict[str, Dict[str, gymnasium.spaces.Space]] | None = None, n_envs: int = 1, buffer_size: int = 1, use_gae: bool | None = False, use_advnorm: bool | None = False, gamma: float | None = None, gae_lam: float | None = None, **kwargs)[源代码]¶
基类:
BaseBufferReplay buffer for on-policy MARL algorithms.
- 参数:
agent_keys (List[str]) – Keys that identify each agent.
state_space (Dict[str, Space]) – Global state space, type: Discrete, Box.
obs_space (Dict[str, Dict[str, Space]]) – Observation space for one agent (suppose same obs space for group agents).
act_space (Dict[str, Dict[str, Space]]) – Action space for one agent (suppose same actions space for group agents).
n_envs (int) – Number of parallel environments.
buffer_size (int) – Buffer size of total experience data.
use_gae (bool) – Whether to use GAE trick.
use_advnorm (bool) – Whether to use Advantage normalization trick.
gamma (float) – Discount factor.
gae_lam (float) – gae lambda.
**kwargs – Other arguments.
示例
>>> state_space=None >>> obs_space={'agent_0': Box(-inf, inf, (18,), float32), ... 'agent_1': Box(-inf, inf, (18,), float32), ... 'agent_2': Box(-inf, inf, (18,), float32)}, >>> act_space={'agent_0': Box(0.0, 1.0, (5,), float32), ... 'agent_1': Box(0.0, 1.0, (5,), float32), ... 'agent_2': Box(0.0, 1.0, (5,), float32)}, >>> n_envs=16, >>> buffer_size=1600, >>> agent_keys=['agent_0', 'agent_1', 'agent_2'], >>> memory = MARL_OffPolicyBuffer(agent_keys=agent_keys, state_space=state_space, obs_space=obs_space, ... act_space=act_space, n_envs=n_envs, buffer_size=buffer_size, ... use_gae=False, use_advnorm=False, gamma=0.99, gae_lam=0.95)
- clear()[源代码]¶
Clears the memory data in the replay buffer.
示例
An example shows the data shape (
n_env=16,buffer_size=1600,agent_keys=['agent_0', 'agent_1', 'agent_2']):self.data = { 'obs': { 'agent_0': shape=[16, 100, 18], 'agent_1': shape=[16, 100, 18], 'agent_2': shape=[16, 100, 18], }, # dim_obs: 18 'actions': { 'agent_0': shape=[16, 100, 5], 'agent_1': shape=[16, 100, 5], 'agent_2': shape=[16, 100, 5], }, # dim_act: 5 ... }
- finish_path(i_env: int | None = None, value_next: dict | None = None, value_normalizer=None)[源代码]¶
Calculates and stores the returns and advantages when an episode is finished.
- 参数:
i_env (int) – The index of environment.
value_next (dict) – The critic values of the terminal state.
value_normalizer – The value normalizer method, default is None.
- class xuance.common.memory_tools_marl.MARL_OnPolicyBuffer_RNN(agent_keys: List[str], state_space: Dict[str, gymnasium.spaces.Space] | None = None, obs_space: Dict[str, Dict[str, gymnasium.spaces.Space]] | None = None, act_space: Dict[str, Dict[str, gymnasium.spaces.Space]] | None = None, n_envs: int = 1, buffer_size: int = 1, max_episode_steps: int = 1, use_gae: bool | None = False, use_advnorm: bool | None = False, gamma: float | None = None, gae_lam: float | None = None, **kwargs)[源代码]¶
-
Replay buffer for on-policy MARL algorithms with DRQN trick.
- 参数:
agent_keys (List[str]) – Keys that identify each agent.
state_space (Dict[str, Space]) – Global state space, type: Discrete, Box.
obs_space (Dict[str, Dict[str, Space]]) – Observation space for one agent (suppose same obs space for group agents).
act_space (Dict[str, Dict[str, Space]]) – Action space for one agent (suppose same actions space for group agents).
n_envs (int) – Number of parallel environments.
buffer_size (int) – Buffer size of total experience data.
max_episode_steps (int) – The sequence length of each episode data.
use_gae (bool) – Whether to use GAE trick.
use_advnorm (bool) – Whether to use Advantage normalization trick.
gamma (float) – Discount factor.
gae_lam (float) – gae lambda.
**kwargs – Other arguments.
示例
>>> state_space=None >>> obs_space={'agent_0': Box(-inf, inf, (18,), float32), ... 'agent_1': Box(-inf, inf, (18,), float32), ... 'agent_2': Box(-inf, inf, (18,), float32)}, >>> act_space={'agent_0': Box(0.0, 1.0, (5,), float32), ... 'agent_1': Box(0.0, 1.0, (5,), float32), ... 'agent_2': Box(0.0, 1.0, (5,), float32)}, >>> n_envs=16, >>> buffer_size=1600, >>> agent_keys=['agent_0', 'agent_1', 'agent_2'], >>> max_episode_steps = 100 >>> memory = MARL_OffPolicyBuffer(agent_keys=agent_keys, state_space=state_space, obs_space=obs_space, ... act_space=act_space, n_envs=n_envs, buffer_size=buffer_size, ... max_episode_steps=max_episode_steps, ... use_gae=False, use_advnorm=False, gamma=0.99, gae_lam=0.95)
- clear()[源代码]¶
Clear all buffer data in the on-policy replay buffer.
This method resets all stored observations, actions, rewards, values, and other related fields to zero.
- 参数:
None –
- 返回:
None
- finish_path(i_env: int | None = None, i_step: int | None = None, value_next: dict | None = None, value_normalizer: dict | None = None)[源代码]¶
Calculates and stores the returns and advantages when an episode is finished.
- 参数:
i_env (int) – The index of environment.
i_step (int) – The index of step for current environment.
value_next (Optional[dict]) – The critic values of the terminal state.
value_normalizer (Optional[dict]) – The value normalizer method, default is None.
- property full¶
- sample(indexes: ndarray | None = None)[源代码]¶
Samples a batch of data from the replay buffer.
- 参数:
indexes (int) – The indexes of the data in the buffer that will be sampled.
- 返回:
The sampled data.
- 返回类型:
samples_dict (dict)
- class xuance.common.memory_tools_marl.MeanField_OffPolicyBuffer(agent_keys: List[str], state_space: Dict[str, gymnasium.spaces.Space] | None = None, obs_space: Dict[str, Dict[str, gymnasium.spaces.Space]] | None = None, act_space: Dict[str, Dict[str, gymnasium.spaces.Space]] | None = None, n_envs: int = 1, buffer_size: int = 1, batch_size: int = 1, **kwargs)[源代码]¶
-
Replay buffer for off-policy Mean-Field MARL algorithms (Mean-Field Q-Learning).
- 参数:
agent_keys (List[str]) – Keys that identify each agent.
state_space (Dict[str, Space]) – Global state space, type: Discrete, Box.
obs_space (Dict[str, Dict[str, Space]]) – Observation space for one agent (suppose same obs space for group agents).
act_space (Dict[str, Dict[str, Space]]) – Action space for one agent (suppose same actions space for group agents).
n_envs (int) – Number of parallel environments.
buffer_size (int) – Buffer size of total experience data.
batch_size (int) – Batch size of transition data for a sample.
**kwargs – Other arguments.
示例
>>> state_space=None >>> obs_space={'agent_0': Box(-inf, inf, (18,), float32), ... 'agent_1': Box(-inf, inf, (18,), float32), ... 'agent_2': Box(-inf, inf, (18,), float32)}, >>> act_space={'agent_0': Box(0.0, 1.0, (5,), float32), ... 'agent_1': Box(0.0, 1.0, (5,), float32), ... 'agent_2': Box(0.0, 1.0, (5,), float32)}, >>> n_envs=50, >>> buffer_size=10000, >>> batch_size=256, >>> agent_keys=['agent_0', 'agent_1', 'agent_2'], >>> memory = MARL_OffPolicyBuffer(agent_keys=agent_keys, state_space=state_space, obs_space=obs_space, ... act_space=act_space, n_envs=n_envs, buffer_size=buffer_size, ... batch_size=batch_size)
- clear()[源代码]¶
Clears the memory data in the replay buffer.
示例
An example shows the data shape:
# (n_env=50, buffer_size=10000, agent_keys=['agent_0', 'agent_1', 'agent_2']) self.data = { 'obs': { 'agent_0': shape=[50, 200, 18], 'agent_1': shape=[50, 200, 18], 'agent_2': shape=[50, 200, 18], }, # dim_obs: 18 'actions': { 'agent_0': shape=[50, 200, 5], 'agent_1': shape=[50, 200, 5], 'agent_2': shape=[50, 200, 5], }, # dim_act: 5 ... }
- class xuance.common.memory_tools_marl.MeanField_OffPolicyBuffer_RNN(*args, **kwargs)[源代码]¶
-
- clear()[源代码]¶
Clear all buffer data in the on-policy replay buffer.
This method resets all stored observations, actions, rewards, values, and other related fields to zero.
- 参数:
None –
- 返回:
None
- clear_episodes()[源代码]¶
Clears an episode of data for multiple environments in the replay buffer.
示例
An example shows the data shape (
n_envs=16,max_eps_len=60,agent_keys=['agent_0', 'agent_1', 'agent_2']):self.data = { 'obs': { 'agent_0': shape=[16, 61, 18], 'agent_1': shape=[16, 61, 18], 'agent_2': shape=[16, 61, 18], }, # dim_obs: 18 'actions': { 'agent_0': shape=[16, 60, 5], 'agent_1': shape=[16, 60, 5], 'agent_2': shape=[16, 60, 5], }, # dim_act: 5 ... 'filled': shape=[16, 60], # Step mask values. True means current step is not terminated. }
- class xuance.common.memory_tools_marl.MeanField_OnPolicyBuffer(*args, **kwargs)[源代码]¶
-
Replay buffer for Mean Field Actor-Critic algorithm.
- clear()[源代码]¶
Clears the memory data in the replay buffer.
示例
An example shows the data shape (
n_env=16,buffer_size=1600,agent_keys=['agent_0', 'agent_1', 'agent_2']):self.data = { 'obs': { 'agent_0': shape=[16, 100, 18], 'agent_1': shape=[16, 100, 18], 'agent_2': shape=[16, 100, 18], }, # dim_obs: 18 'actions': { 'agent_0': shape=[16, 100, 5], 'agent_1': shape=[16, 100, 5], 'agent_2': shape=[16, 100, 5], }, # dim_act: 5 ... }