memory_tools

class xuance.common.memory_tools.Buffer(observation_space: gymnasium.Space, action_space: gymnasium.Space, auxiliary_info_shape: dict | None)[源代码]

基类:ABC

Basic buffer single-agent DRL algorithms.

参数:
  • observation_space – the space for observation data.

  • action_space – the space for action data.

  • auxiliary_info_shape – the shape for auxiliary data if needed.

abstract clear(*args)[源代码]
finish_path(*args)[源代码]
full()[源代码]
abstract sample(*args)[源代码]
abstract store(*args)[源代码]
class xuance.common.memory_tools.DummyOffPolicyBuffer(observation_space: gymnasium.Space, action_space: gymnasium.Space, auxiliary_shape: dict | None, n_envs: int, buffer_size: int, batch_size: int)[源代码]

基类:Buffer

Replay buffer for off-policy DRL algorithms.

参数:
  • observation_space – the observation space of the environment.

  • action_space – the action space of the environment.

  • auxiliary_shape – data shape of auxiliary information (if exists).

  • n_envs – number of parallel environments.

  • buffer_size – the total size of the replay buffer.

  • batch_size – size of transition data for a batch of sample.

clear()[源代码]
sample(batch_size=None)[源代码]
store(obs, acts, rews, terminals, next_obs)[源代码]
class xuance.common.memory_tools.DummyOffPolicyBuffer_Atari(observation_space: gymnasium.Space, action_space: gymnasium.Space, auxiliary_shape: dict | None, n_envs: int, buffer_size: int, batch_size: int)[源代码]

基类:DummyOffPolicyBuffer

Replay buffer for off-policy DRL algorithms and Atari tasks.

参数:
  • observation_space – the observation space of the environment.

  • action_space – the action space of the environment.

  • auxiliary_shape – data shape of auxiliary information (if exists).

  • n_envs – number of parallel environments.

  • buffer_size – the total size of the replay buffer.

  • batch_size – batch size of transition data for a sample.

clear()[源代码]
class xuance.common.memory_tools.DummyOnPolicyBuffer(observation_space: gymnasium.Space, action_space: gymnasium.Space, auxiliary_shape: dict | None, n_envs: int, horizon_size: int, use_gae: bool = True, use_advnorm: bool = True, gamma: float = 0.99, gae_lam: float = 0.95)[源代码]

基类:Buffer

Replay buffer for on-policy DRL algorithms.

参数:
  • observation_space – the observation space of the environment.

  • action_space – the action space of the environment.

  • auxiliary_shape – data shape of auxiliary information (if exists).

  • n_envs – number of parallel environments.

  • horizon_size – max length of steps to store for one environment.

  • use_gae – if use GAE trick.

  • use_advnorm – if use Advantage normalization trick.

  • gamma – discount factor.

  • gae_lam – gae lambda.

clear()[源代码]
finish_path(val, i)[源代码]
property full
sample(indexes)[源代码]
store(obs, acts, rews, value, terminals, aux_info=None)[源代码]
class xuance.common.memory_tools.DummyOnPolicyBuffer_Atari(observation_space: gymnasium.Space, action_space: gymnasium.Space, auxiliary_shape: dict | None, n_envs: int, horizon_size: int, use_gae: bool = True, use_advnorm: bool = True, gamma: float = 0.99, gae_lam: float = 0.95)[源代码]

基类:DummyOnPolicyBuffer

Replay buffer for on-policy DRL algorithms and Atari tasks.

参数:
  • observation_space – the observation space of the environment.

  • action_space – the action space of the environment.

  • auxiliary_shape – data shape of auxiliary information (if exists).

  • n_envs – number of parallel environments.

  • horizon_size – max length of steps to store for one environment.

  • use_gae – if use GAE trick.

  • use_advnorm – if use Advantage normalization trick.

  • gamma – discount factor.

  • gae_lam – gae lambda.

clear()[源代码]
class xuance.common.memory_tools.EpisodeBuffer[源代码]

基类:object

Episode buffer for DRQN agent.

put(transition)[源代码]
sample(lookup_step=None, idx=None) Dict[str, ndarray][源代码]
class xuance.common.memory_tools.PerOffPolicyBuffer(observation_space: gymnasium.Space, action_space: gymnasium.Space, auxiliary_shape: dict | None, n_envs: int, buffer_size: int, batch_size: int, alpha: float = 0.6)[源代码]

基类:Buffer

Prioritized Replay Buffer.

参数:
  • observation_space – the observation space of the environment.

  • action_space – the action space of the environment.

  • auxiliary_shape – data shape of auxiliary information (if exists).

  • n_envs – number of parallel environments.

  • buffer_size – the total size of the replay buffer.

  • batch_size – batch size of transition data for a sample.

  • alpha – prioritized factor.

clear()[源代码]
sample(beta)[源代码]
store(obs, acts, rews, terminals, next_obs)[源代码]
update_priorities(idxes, priorities)[源代码]
class xuance.common.memory_tools.RecurrentOffPolicyBuffer(observation_space: gymnasium.Space, action_space: gymnasium.Space, auxiliary_shape: dict | None, n_envs: int, buffer_size: int, batch_size: int, episode_length: int, lookup_length: int)[源代码]

基类:Buffer

Replay buffer for DRQN-based algorithms.

参数:
  • observation_space – the observation space of the environment.

  • action_space – the action space of the environment.

  • auxiliary_shape – data shape of auxiliary information (if exists).

  • n_envs – number of parallel environments.

  • buffer_size – the size of replay buffer that stores episodes of data.

  • batch_size – batch size of transition data for a sample.

  • episode_length – data length for an episode.

  • lookup_length – the length of history data.

can_sample()[源代码]
clear(*args)[源代码]
property full
sample()[源代码]
store(episode)[源代码]
class xuance.common.memory_tools.SequentialReplayBuffer(observation_space: gymnasium.Space, action_space: gymnasium.Space, auxiliary_shape: dict | None, n_envs: int, buffer_size: int, batch_size: int)[源代码]

基类:Buffer

Sequential Replay buffer for Dreamerv3

参数:
  • observation_space – the observation space of the environment.

  • action_space – the action space of the environment.

  • auxiliary_shape – data shape of auxiliary information (if exists).

  • n_envs – number of parallel environments.

  • buffer_size – the total size of the replay buffer.

  • batch_size – size of transition data for a batch of sample.

clear()[源代码]
sample(seq_len: int)[源代码]

Sample elements from the replay buffer in a sequential manner, without considering the episode boundaries. :param seq_len: :type seq_len: int

返回:

the sampled dictionary with a shape of [envs, sequence_length, batch_size, …].

返回类型:

Dict[str, np.ndarray]

store(obs, acts, rews, terms, truncs, is_first)[源代码]
参数:
  • arrays (all arguments are numpy) – [envs, ~] if ~ != 1 else [envs, ]

  • shape – [envs, ~] if ~ != 1 else [envs, ]

Returns:

xuance.common.memory_tools.create_memory(shape: tuple | dict | None, n_envs: int, n_size: int, dtype: type = <class 'numpy.float32'>)[源代码]

Create a numpy array for memory data.

参数:
  • shape – data shape.

  • n_envs – number of parallel environments.

  • n_size – length of data sequence for each environment.

  • dtype – numpy data type.

返回:

numpy.zeros())

返回类型:

An empty memory space to store data. (initial

xuance.common.memory_tools.sample_batch(memory: ndarray | dict | None, index: ndarray | tuple | None)[源代码]

Sample a batch of data from the selected memory.

参数:
  • memory – memory that contains experience data.

  • index – pointer to the location for the selected data.

返回:

A batch of data.

xuance.common.memory_tools.store_element(data: ndarray | dict | float | None, memory: dict | ndarray, ptr: int)[源代码]

Insert a step of data into current memory.

参数:
  • data – target data that to be stored.

  • memory – the memory where data will be stored.

  • ptr – pointer to the location for the data.