memory_tools¶

class xuance.common.memory_tools.Buffer(observation_space: gymnasium.Space, action_space: gymnasium.Space, auxiliary_info_shape: dict | None, num_envs: int, buffer_size: int)[源代码]¶

基类：ABC

Basic buffer single-agent DRL algorithms.

参数:

observation_space – the space for observation data.
action_space – the space for action data.
auxiliary_info_shape – the shape for auxiliary data if needed.

abstract clear(*args)[源代码]¶

finish_path(*args)[源代码]¶

property full¶

abstract sample(*args)[源代码]¶

abstract store(*args)[源代码]¶

class xuance.common.memory_tools.DummyOffPolicyBuffer(observation_space: gymnasium.Space, action_space: gymnasium.Space, auxiliary_shape: dict | None, n_envs: int, buffer_size: int, batch_size: int)[源代码]¶

基类：Buffer

Replay buffer for off-policy DRL algorithms.

参数:

observation_space – the observation space of the environment.
action_space – the action space of the environment.
auxiliary_shape – data shape of auxiliary information (if exists).
n_envs – number of parallel environments.
buffer_size – the total size of the replay buffer.
batch_size – size of transition data for a batch of sample.

clear()[源代码]¶

sample(batch_size=None)[源代码]¶

store(obs, acts, rews, terminals, next_obs)[源代码]¶

class xuance.common.memory_tools.DummyOffPolicyBuffer_Atari(observation_space: gymnasium.Space, action_space: gymnasium.Space, auxiliary_shape: dict | None, n_envs: int, buffer_size: int, batch_size: int)[源代码]¶

基类：DummyOffPolicyBuffer

Replay buffer for off-policy DRL algorithms and Atari tasks.

参数:

observation_space – the observation space of the environment.
action_space – the action space of the environment.
auxiliary_shape – data shape of auxiliary information (if exists).
n_envs – number of parallel environments.
buffer_size – the total size of the replay buffer.
batch_size – batch size of transition data for a sample.

clear()[源代码]¶

class xuance.common.memory_tools.DummyOnPolicyBuffer(observation_space: gymnasium.Space, action_space: gymnasium.Space, auxiliary_shape: dict | None, n_envs: int, horizon_size: int, use_gae: bool = True, use_advnorm: bool = True, gamma: float = 0.99, gae_lam: float = 0.95)[源代码]¶

基类：Buffer

Replay buffer for on-policy DRL algorithms.

参数:

observation_space – the observation space of the environment.
action_space – the action space of the environment.
auxiliary_shape – data shape of auxiliary information (if exists).
n_envs – number of parallel environments.
horizon_size – max length of steps to store for one environment.
use_gae – if use GAE trick.
use_advnorm – if use Advantage normalization trick.
gamma – discount factor.
gae_lam – gae lambda.

clear()[源代码]¶

finish_path(val, i)[源代码]¶

property full¶

sample(indexes)[源代码]¶

store(obs, acts, rews, value, terminals, aux_info=None)[源代码]¶

class xuance.common.memory_tools.DummyOnPolicyBuffer_Atari(observation_space: gymnasium.Space, action_space: gymnasium.Space, auxiliary_shape: dict | None, n_envs: int, horizon_size: int, use_gae: bool = True, use_advnorm: bool = True, gamma: float = 0.99, gae_lam: float = 0.95)[源代码]¶

基类：DummyOnPolicyBuffer

Replay buffer for on-policy DRL algorithms and Atari tasks.

参数:

observation_space – the observation space of the environment.
action_space – the action space of the environment.
auxiliary_shape – data shape of auxiliary information (if exists).
n_envs – number of parallel environments.
horizon_size – max length of steps to store for one environment.
use_gae – if use GAE trick.
use_advnorm – if use Advantage normalization trick.
gamma – discount factor.
gae_lam – gae lambda.

clear()[源代码]¶

class xuance.common.memory_tools.EpisodeBuffer[源代码]¶

基类：object

Episode buffer for DRQN agent.

put(transition)[源代码]¶

sample(lookup_step=None, idx=None) → Dict[str, ndarray][源代码]¶

class xuance.common.memory_tools.PerOffPolicyBuffer(observation_space: gymnasium.Space, action_space: gymnasium.Space, auxiliary_shape: dict | None, n_envs: int, buffer_size: int, batch_size: int, alpha: float = 0.6)[源代码]¶

基类：Buffer

Prioritized Replay Buffer.

参数:

observation_space – the observation space of the environment.
action_space – the action space of the environment.
auxiliary_shape – data shape of auxiliary information (if exists).
n_envs – number of parallel environments.
buffer_size – the total size of the replay buffer.
batch_size – batch size of transition data for a sample.
alpha – prioritized factor.

clear()[源代码]¶

sample(beta)[源代码]¶

store(obs, acts, rews, terminals, next_obs)[源代码]¶

update_priorities(idxes, priorities)[源代码]¶

class xuance.common.memory_tools.RecurrentOffPolicyBuffer(observation_space: gymnasium.Space, action_space: gymnasium.Space, auxiliary_shape: dict | None, n_envs: int, buffer_size: int, batch_size: int, episode_length: int, lookup_length: int)[源代码]¶

基类：Buffer

Replay buffer for DRQN-based algorithms.

参数:

observation_space – the observation space of the environment.
action_space – the action space of the environment.
auxiliary_shape – data shape of auxiliary information (if exists).
n_envs – number of parallel environments.
buffer_size – the size of replay buffer that stores episodes of data.
batch_size – batch size of transition data for a sample.
episode_length – data length for an episode.
lookup_length – the length of history data.

can_sample()[源代码]¶

clear(*args)[源代码]¶

property full¶

sample()[源代码]¶

store(episode)[源代码]¶

class xuance.common.memory_tools.SequentialReplayBuffer(observation_space: gymnasium.Space, action_space: gymnasium.Space, auxiliary_shape: dict | None, n_envs: int, buffer_size: int, batch_size: int)[源代码]¶

基类：Buffer

Sequential Replay buffer for Dreamerv3

参数:

observation_space – the observation space of the environment.
action_space – the action space of the environment.
auxiliary_shape – data shape of auxiliary information (if exists).
n_envs – number of parallel environments.
buffer_size – the total size of the replay buffer.
batch_size – size of transition data for a batch of sample.

clear()[源代码]¶

sample(seq_len: int)[源代码]¶

Sample elements from the replay buffer in a sequential manner, without considering the episode boundaries. :param seq_len: :type seq_len: int

返回:: the sampled dictionary with a shape of [envs, sequence_length, batch_size, …].
返回类型:: Dict[str, np.ndarray]

store(obs, acts, rews, terms, truncs, is_first)[源代码]¶

参数:

arrays (all arguments are numpy) – [envs, ~] if ~ != 1 else [envs, ]
shape – [envs, ~] if ~ != 1 else [envs, ]

Returns:

xuance.common.memory_tools.create_memory(shape: tuple | dict | None, n_envs: int, n_size: int, dtype: type = <class 'numpy.float32'>)[源代码]¶

Create a numpy array for memory data.

参数:

shape – data shape.
n_envs – number of parallel environments.
n_size – length of data sequence for each environment.
dtype – numpy data type.

返回:

numpy.zeros())

返回类型:

An empty memory space to store data. (initial

xuance.common.memory_tools.sample_batch(memory: ndarray | dict | None, index: ndarray | tuple | None)[源代码]¶

Sample a batch of data from the selected memory.

参数:

memory – memory that contains experience data.
index – pointer to the location for the selected data.

返回:

A batch of data.

xuance.common.memory_tools.store_element(data: ndarray | dict | float | None, memory: dict | ndarray, ptr: int)[源代码]¶

Insert a step of data into current memory.

参数:

data – target data that to be stored.
memory – the memory where data will be stored.
ptr – pointer to the location for the data.