Configuration Examples

以 Atari 环境中的 DQN 算法为例,除了基本参数配置外,与算法相关的特定参数还存储在 “xuance/configs/dqn/atari.yaml” 文件中。

由于 Atari 环境中包含 60 多种不同的场景,这些场景之间的差异主要体现在任务上而非环境结构上,因此使用一个默认的参数配置文件即可满足大多数情况的需求。

对于场景差异较大的环境(例如 “Box2D” 环境中的 “CarRacing-v2” 和 “LunarLander” 场景),前者的状态输入为大小为 96×96×3 的 RGB 图像, 而后者的状态输入则是一个 8 维向量。因此,针对这两种场景的 DQN 算法参数配置分别保存在以下两个文件中:

  • xuance/configs/dqn/box2d/CarRacing-v2.yaml

  • xuance/configs/dqn/box2d/LunarLander-v2.yaml

Within the following content, we provide the preset arguments for each implementation that can be run by following the steps in Quick Start. 在接下来的内容中,我们将为每个实现提供预设参数,这些参数可以按照 快速开始 中的步骤直接运行。

Value-based Algorithms

agent: "DQN"
env_name: "Classic Control"
env_id: "CartPole-v1"
env_seed: 1
vectorize: "DummyVecEnv"
policy: "Basic_Q_network"
representation: "Basic_MLP"
learner: "DQN_Learner"
runner: "DRL"

representation_hidden_size: [128,]
q_hidden_size: [128,]
activation: 'relu'

seed: 1
parallels: 10
buffer_size: 100000
batch_size: 256
learning_rate: 0.001
gamma: 0.99

start_greedy: 0.5
end_greedy: 0.01
decay_step_greedy: 100000
sync_frequency: 50
training_frequency: 1
running_steps: 200000  # 200k
start_training: 1000

use_grad_clip: False  # gradient normalization
grad_clip_norm: 0.5
use_actions_mask: False
use_obsnorm: False
use_rewnorm: False
obsnorm_range: 5
rewnorm_range: 5

test_steps: 10000
eval_interval: 20000
test_episode: 1
log_dir: "./logs/dqn/"
model_dir: "./models/dqn/"

Policy-based Algorithms

agent: "PG"
env_name: "Classic Control"
env_id: "CartPole-v1"
env_seed: 1
representation: "Basic_MLP"
vectorize: "DummyVecEnv"
policy: "Categorical_Actor"
learner: "PG_Learner"
runner: "DRL"

representation_hidden_size: [128,]
actor_hidden_size: [128,]
activation: 'relu'
activation_action: 'tanh'

seed: 1
parallels: 10
running_steps: 300000
horizon_size: 128  # the horizon size for an environment, buffer_size = horizon_size * parallels.
n_epochs: 1
n_minibatch: 1
learning_rate: 0.0004

ent_coef: 0.01
gamma: 0.98
use_gae: False
gae_lambda: 0.95
use_advnorm: False

use_grad_clip: True  # gradient normalization
clip_type: 1  # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
grad_clip_norm: 0.5
use_actions_mask: False
use_obsnorm: True
use_rewnorm: True
obsnorm_range: 5
rewnorm_range: 5

test_steps: 10000
eval_interval: 50000
test_episode: 1
log_dir: "./logs/pg/"
model_dir: "./models/pg/"

MARL algorithms

agent: "IQL"  # the learning algorithms_marl
env_name: "mpe"  # Name of the environment.
env_id: "simple_spread_v3"
env_seed: 1
continuous_action: False  # Continuous action space or not.
learner: "IQL_Learner"
policy: "Basic_Q_network_marl"
representation: "Basic_MLP"
vectorize: "DummyVecMultiAgentEnv"
runner: "MARL"  # Runner

use_rnn: False  # Whether to use recurrent neural networks.
rnn: "GRU"  # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1  # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0  # dropout should be a number in range [0, 1], the probability of an element being zeroed.

representation_hidden_size: [64, ]
q_hidden_size: [64, ]  # the units for each hidden layer
activation: "relu"  # The activation function of each hidden layer.

seed: 1
parallels: 16
buffer_size: 100000
batch_size: 256
learning_rate: 0.001
gamma: 0.99  # discount factor
double_q: True  # use double q learning

start_greedy: 1.0
end_greedy: 0.05
decay_step_greedy: 2500000
start_training: 1000  # start training after n steps
running_steps: 10000000  # 10M
training_frequency: 25
sync_frequency: 100

use_grad_clip: False
grad_clip_norm: 0.5
use_parameter_sharing: True
use_actions_mask: False

eval_interval: 100000
test_episode: 5
log_dir: "./logs/iql/"
model_dir: "./models/iql/"