Configuration Examples¶
以 Atari 环境中的 DQN 算法为例,除了基本参数配置外,与算法相关的特定参数还存储在 “xuance/configs/dqn/atari.yaml” 文件中。
由于 Atari 环境中包含 60 多种不同的场景,这些场景之间的差异主要体现在任务上而非环境结构上,因此使用一个默认的参数配置文件即可满足大多数情况的需求。
对于场景差异较大的环境(例如 “Box2D” 环境中的 “CarRacing-v2” 和 “LunarLander” 场景),前者的状态输入为大小为 96×96×3 的 RGB 图像, 而后者的状态输入则是一个 8 维向量。因此,针对这两种场景的 DQN 算法参数配置分别保存在以下两个文件中:
xuance/configs/dqn/box2d/CarRacing-v2.yaml
xuance/configs/dqn/box2d/LunarLander-v2.yaml
Within the following content, we provide the preset arguments for each implementation that can be run by following the steps in Quick Start. 在接下来的内容中,我们将为每个实现提供预设参数,这些参数可以按照 快速开始 中的步骤直接运行。
Value-based Algorithms¶
agent: "DQN"
env_name: "Classic Control"
env_id: "CartPole-v1"
env_seed: 1
vectorize: "DummyVecEnv"
policy: "Basic_Q_network"
representation: "Basic_MLP"
learner: "DQN_Learner"
runner: "DRL"
representation_hidden_size: [128,]
q_hidden_size: [128,]
activation: 'relu'
seed: 1
parallels: 10
buffer_size: 100000
batch_size: 256
learning_rate: 0.001
gamma: 0.99
start_greedy: 0.5
end_greedy: 0.01
decay_step_greedy: 100000
sync_frequency: 50
training_frequency: 1
running_steps: 200000 # 200k
start_training: 1000
use_grad_clip: False # gradient normalization
grad_clip_norm: 0.5
use_actions_mask: False
use_obsnorm: False
use_rewnorm: False
obsnorm_range: 5
rewnorm_range: 5
test_steps: 10000
eval_interval: 20000
test_episode: 1
log_dir: "./logs/dqn/"
model_dir: "./models/dqn/"
agent: "DQN"
env_name: "Classic Control"
env_id: "Acrobot-v1"
env_seed: 1
vectorize: "DummyVecEnv"
policy: "Basic_Q_network"
representation: "Basic_MLP"
learner: "DQN_Learner"
runner: "DRL"
representation_hidden_size: [128,]
q_hidden_size: [128,]
activation: 'relu'
seed: 1
parallels: 10
buffer_size: 100000
batch_size: 256
learning_rate: 0.001
gamma: 0.99
start_greedy: 0.5
end_greedy: 0.01
decay_step_greedy: 100000
sync_frequency: 50
training_frequency: 1
running_steps: 200000 # 200k
start_training: 1000
use_grad_clip: False # gradient normalization
grad_clip_norm: 0.5
use_actions_mask: False
use_obsnorm: False
use_rewnorm: False
obsnorm_range: 5
rewnorm_range: 5
test_steps: 10000
eval_interval: 20000
test_episode: 1
log_dir: "./logs/dqn/"
model_dir: "./models/dqn/"
agent: "DQN"
env_name: "Classic Control"
env_id: "MountainCar-v0"
env_seed: 1
vectorize: "DummyVecEnv"
policy: "Basic_Q_network"
representation: "Basic_MLP"
learner: "DQN_Learner"
runner: "DRL"
representation_hidden_size: [256, ]
q_hidden_size: [256, ]
activation: 'leaky_relu'
seed: 1
parallels: 10
buffer_size: 200000
batch_size: 256
learning_rate: 0.1
gamma: 0.99
start_greedy: 1.0
end_greedy: 0.01
decay_step_greedy: 100000
sync_frequency: 200
training_frequency: 2
running_steps: 2000000 # 2M
start_training: 1000
use_grad_clip: False # gradient normalization
grad_clip_norm: 0.5
use_actions_mask: False
use_obsnorm: False
use_rewnorm: False
obsnorm_range: 5
rewnorm_range: 5
test_steps: 10000
eval_interval: 50000
test_episode: 5
log_dir: "./logs/dqn/"
model_dir: "./models/dqn/"
agent: "DQN"
env_name: "Box2D"
env_id: "CarRacing-v2"
env_seed: 1
vectorize: "DummyVecEnv"
policy: "Basic_Q_network"
representation: "Basic_CNN"
learner: "DQN_Learner"
runner: "DRL"
# the following three arguments are for "Basic_CNN" representation.
filters: [16, 16, 32] # [16, 16, 32, 32]
kernels: [8, 4, 3] # [8, 6, 4, 4]
strides: [4, 2, 1] # [2, 2, 2, 2]
q_hidden_size: [512,]
activation: 'relu'
seed: 1
parallels: 2
buffer_size: 20000
batch_size: 32
learning_rate: 0.0001
gamma: 0.99
start_greedy: 0.5
end_greedy: 0.01
decay_step_greedy: 500000
sync_frequency: 500
training_frequency: 1
running_steps: 2000000
start_training: 1000
use_grad_clip: False # gradient normalization
grad_clip_norm: 0.5
use_actions_mask: False
use_obsnorm: False
use_rewnorm: False
obsnorm_range: 5
rewnorm_range: 5
test_steps: 10000
eval_interval: 100000
test_episode: 1
log_dir: "./logs/dqn/"
model_dir: "./models/dqn/"
agent: "DQN"
env_name: "Box2D"
env_id: "LunarLander-v2"
env_seed: 1
vectorize: "DummyVecEnv"
policy: "Basic_Q_network"
representation: "Basic_MLP"
learner: "DQN_Learner"
runner: "DRL"
representation_hidden_size: [128,]
q_hidden_size: [128,]
activation: 'relu'
seed: 1
parallels: 10
buffer_size: 100000
batch_size: 256
learning_rate: 0.001
gamma: 0.99
start_greedy: 0.5
end_greedy: 0.01
decay_step_greedy: 100000
sync_frequency: 50
training_frequency: 1
running_steps: 200000
start_training: 1000
use_grad_clip: False # gradient normalization
grad_clip_norm: 0.5
use_actions_mask: False
use_obsnorm: False
use_rewnorm: False
obsnorm_range: 5
rewnorm_range: 5
test_steps: 10000
eval_interval: 50000
test_episode: 1
log_dir: "./logs/dqn/"
model_dir: "./models/dqn/"
agent: "DQN"
vectorize: "Dummy_Atari"
env_name: "Atari"
env_id: "ALE/Breakout-v5"
env_seed: 1
obs_type: "grayscale" # choice for Atari env: ram, rgb, grayscale
img_size: [84, 84] # default is 210 x 160 in gym[Atari]
num_stack: 4 # frame stack trick
frame_skip: 4 # frame skip trick
noop_max: 30 # Do no-op action for a number of steps in [1, noop_max].
policy: "Basic_Q_network"
representation: "Basic_CNN"
learner: "DQN_Learner"
runner: "DRL"
# the following three arguments are for "Basic_CNN" representation.
filters: [32, 64, 64] # [16, 16, 32, 32]
kernels: [8, 4, 3] # [8, 6, 4, 4]
strides: [4, 2, 1] # [2, 2, 2, 2]
q_hidden_size: [512, ]
activation: "relu" # The activation function of each hidden layer.
seed: 1069
parallels: 5
buffer_size: 500000
batch_size: 32 # 64
learning_rate: 0.0001
gamma: 0.99
start_greedy: 0.5
end_greedy: 0.05
decay_step_greedy: 10000000 # 10M
sync_frequency: 500
training_frequency: 1
running_steps: 50000000 # 50M
start_training: 10000
use_grad_clip: False # gradient normalization
grad_clip_norm: 0.5
use_actions_mask: False
use_obsnorm: False
use_rewnorm: False
obsnorm_range: 5
rewnorm_range: 5
test_steps: 10000
eval_interval: 500000
test_episode: 5
log_dir: "./logs/dqn/"
model_dir: "./models/dqn/"
agent: "C51DQN"
env_name: "Classic Control"
env_id: "CartPole-v1"
env_seed: 1
vectorize: "DummyVecEnv"
learner: "C51_Learner"
policy: "C51_Q_network"
representation: "Basic_MLP"
runner: "DRL"
representation_hidden_size: [128,]
q_hidden_size: [128,]
activation: 'relu'
seed: 1
parallels: 10
buffer_size: 200000
batch_size: 256
learning_rate: 0.001
gamma: 0.99
v_min: 0
v_max: 200
atom_num: 51
start_greedy: 0.5
end_greedy: 0.01
decay_step_greedy: 100000
sync_frequency: 100
training_frequency: 1
running_steps: 200000
start_training: 1000
use_grad_clip: False # gradient normalization
grad_clip_norm: 0.5
use_actions_mask: False
use_obsnorm: False
use_rewnorm: False
obsnorm_range: 5
rewnorm_range: 5
test_steps: 10000
eval_interval: 50000
test_episode: 1
log_dir: "./logs/c51/"
model_dir: "./models/c51/"
agent: "C51DQN"
env_name: "Classic Control"
env_id: "Acrobot-v1"
env_seed: 1
vectorize: "DummyVecEnv"
learner: "C51_Learner"
policy: "C51_Q_network"
representation: "Basic_MLP"
runner: "DRL"
representation_hidden_size: [128,]
q_hidden_size: [128,]
activation: 'relu'
seed: 1
parallels: 10
buffer_size: 200000
batch_size: 256
learning_rate: 0.001
gamma: 0.99
v_min: 0
v_max: 200
atom_num: 51
start_greedy: 0.5
end_greedy: 0.01
decay_step_greedy: 100000
sync_frequency: 100
training_frequency: 1
running_steps: 300000
start_training: 1000
use_grad_clip: False # gradient normalization
grad_clip_norm: 0.5
use_actions_mask: False
use_obsnorm: False
use_rewnorm: False
obsnorm_range: 5
rewnorm_range: 5
test_steps: 10000
eval_interval: 50000
test_episode: 1
log_dir: "./logs/c51/"
model_dir: "./models/c51/"
agent: "C51DQN"
env_name: "Classic Control"
env_id: "MountainCar-v0"
env_seed: 1
vectorize: "DummyVecEnv"
learner: "C51_Learner"
policy: "C51_Q_network"
representation: "Basic_MLP"
runner: "DRL"
representation_hidden_size: [128,]
q_hidden_size: [128,]
activation: 'relu'
seed: 1
parallels: 10
buffer_size: 200000
batch_size: 256
learning_rate: 0.001
gamma: 0.99
v_min: 0
v_max: 200
atom_num: 51
start_greedy: 0.5
end_greedy: 0.01
decay_step_greedy: 100000
sync_frequency: 100
training_frequency: 1
running_steps: 200000
start_training: 1000
use_grad_clip: False # gradient normalization
grad_clip_norm: 0.5
use_actions_mask: False
use_obsnorm: False
use_rewnorm: False
obsnorm_range: 5
rewnorm_range: 5
test_steps: 10000
eval_interval: 50000
test_episode: 1
log_dir: "./logs/c51/"
model_dir: "./models/c51/"
agent: "C51DQN"
env_name: "Box2D"
env_id: "CarRacing-v2"
env_seed: 1
vectorize: "DummyVecEnv"
learner: "C51_Learner"
policy: "C51_Q_network"
representation: "Basic_CNN"
runner: "DRL"
# the following three arguments are for "Basic_CNN" representation.
filters: [16, 16, 32] # [16, 16, 32, 32]
kernels: [8, 4, 3] # [8, 6, 4, 4]
strides: [4, 2, 1] # [2, 2, 2, 2]
q_hidden_size: [512,]
activation: 'relu'
seed: 1
parallels: 2
buffer_size: 20000
batch_size: 32
learning_rate: 0.0001
gamma: 0.99
v_min: 0
v_max: 200
atom_num: 51
start_greedy: 0.5
end_greedy: 0.01
decay_step_greedy: 50000
sync_frequency: 500
training_frequency: 1
running_steps: 200000
start_training: 1000
use_grad_clip: False # gradient normalization
grad_clip_norm: 0.5
use_actions_mask: False
use_obsnorm: False
use_rewnorm: False
obsnorm_range: 5
rewnorm_range: 5
test_steps: 10000
eval_interval: 5000
test_episode: 1
log_dir: "./logs/c51/"
model_dir: "./models/c51/"
agent: "C51DQN"
env_name: "Box2D"
env_id: "LunarLander-v2"
env_seed: 1
vectorize: "DummyVecEnv"
learner: "C51_Learner"
policy: "C51_Q_network"
representation: "Basic_MLP"
runner: "DRL"
representation_hidden_size: [128,]
q_hidden_size: [128,]
activation: 'relu'
seed: 1
parallels: 10
buffer_size: 200000
batch_size: 256
learning_rate: 0.001
gamma: 0.99
v_min: 0
v_max: 200
atom_num: 51
start_greedy: 0.5
end_greedy: 0.01
decay_step_greedy: 100000
sync_frequency: 100
training_frequency: 1
running_steps: 200000
start_training: 1000
use_grad_clip: False # gradient normalization
grad_clip_norm: 0.5
use_actions_mask: False
use_obsnorm: False
use_rewnorm: False
obsnorm_range: 5
rewnorm_range: 5
test_steps: 10000
eval_interval: 50000
test_episode: 1
log_dir: "./logs/c51/"
model_dir: "./models/c51/"
agent: "C51DQN"
vectorize: "Dummy_Atari"
env_name: "Atari"
env_id: "ALE/Breakout-v5"
env_seed: 1
obs_type: "grayscale" # choice for Atari env: ram, rgb, grayscale
img_size: [84, 84] # default is 210 x 160 in gym[Atari]
num_stack: 4 # frame stack trick
frame_skip: 4 # frame skip trick
noop_max: 30 # Do no-op action for a number of steps in [1, noop_max].
learner: "C51_Learner"
policy: "C51_Q_network"
representation: "Basic_CNN"
runner: "DRL"
# the following three arguments are for "Basic_CNN" representation.
filters: [32, 64, 64] # [16, 16, 32, 32]
kernels: [8, 4, 3] # [8, 6, 4, 4]
strides: [4, 2, 1] # [2, 2, 2, 2]
q_hidden_size: [512, ]
activation: "relu" # The activation function of each hidden layer.
seed: 1069
parallels: 5
buffer_size: 500000
batch_size: 32 # 64
learning_rate: 0.0001
gamma: 0.99
v_min: 0
v_max: 200
atom_num: 51
start_greedy: 0.5
end_greedy: 0.05
decay_step_greedy: 10000000 # 10M
sync_frequency: 500
training_frequency: 1
running_steps: 50000000 # 50M
start_training: 10000
use_grad_clip: False # gradient normalization
grad_clip_norm: 0.5
use_actions_mask: False
use_obsnorm: False
use_rewnorm: False
obsnorm_range: 5
rewnorm_range: 5
test_steps: 10000
eval_interval: 500000
test_episode: 3
log_dir: "./logs/c51/"
model_dir: "./models/c51/"
agent: "DDQN"
env_name: "Classic Control"
env_id: "CartPole-v1"
env_seed: 1
vectorize: "DummyVecEnv"
policy: "Basic_Q_network"
representation: "Basic_MLP"
learner: "DDQN_Learner"
runner: "DRL"
representation_hidden_size: [128,]
q_hidden_size: [128,]
activation: 'relu'
seed: 1
parallels: 10
buffer_size: 200000
batch_size: 128
learning_rate: 0.001
gamma: 0.99
start_greedy: 0.5
end_greedy: 0.01
decay_step_greedy: 100000
sync_frequency: 100
training_frequency: 1
running_steps: 300000
start_training: 1000
use_grad_clip: False # gradient normalization
grad_clip_norm: 0.5
use_actions_mask: False
use_obsnorm: False
use_rewnorm: False
obsnorm_range: 5
rewnorm_range: 5
test_steps: 10000
eval_interval: 50000
test_episode: 1
log_dir: "./logs/ddqn/"
model_dir: "./models/ddqn/"
agent: "DDQN"
env_name: "Classic Control"
env_id: "Acrobot-v1"
env_seed: 1
vectorize: "DummyVecEnv"
policy: "Basic_Q_network"
representation: "Basic_MLP"
learner: "DDQN_Learner"
runner: "DRL"
representation_hidden_size: [128,]
q_hidden_size: [128,]
activation: 'relu'
seed: 1
parallels: 10
buffer_size: 200000
batch_size: 128
learning_rate: 0.001
gamma: 0.99
start_greedy: 0.5
end_greedy: 0.01
decay_step_greedy: 100000
sync_frequency: 100
training_frequency: 1
running_steps: 300000
start_training: 1000
use_grad_clip: False # gradient normalization
grad_clip_norm: 0.5
use_actions_mask: False
use_obsnorm: False
use_rewnorm: False
obsnorm_range: 5
rewnorm_range: 5
test_steps: 10000
eval_interval: 50000
test_episode: 1
log_dir: "./logs/ddqn/"
model_dir: "./models/ddqn/"
agent: "DDQN"
env_name: "Classic Control"
env_id: "MountainCar-v0"
env_seed: 1
vectorize: "DummyVecEnv"
policy: "Basic_Q_network"
representation: "Basic_MLP"
learner: "DDQN_Learner"
runner: "DRL"
representation_hidden_size: [128,]
q_hidden_size: [128,]
activation: 'relu'
seed: 1
parallels: 10
buffer_size: 200000
batch_size: 128
learning_rate: 0.001
gamma: 0.99
start_greedy: 0.5
end_greedy: 0.01
decay_step_greedy: 100000
sync_frequency: 100
training_frequency: 1
running_steps: 300000
start_training: 1000
use_grad_clip: False # gradient normalization
grad_clip_norm: 0.5
use_actions_mask: False
use_obsnorm: False
use_rewnorm: False
obsnorm_range: 5
rewnorm_range: 5
test_steps: 10000
eval_interval: 50000
test_episode: 1
log_dir: "./logs/ddqn/"
model_dir: "./models/ddqn/"
agent: "DDQN"
env_name: "Box2D"
env_id: "CarRacing-v2"
env_seed: 1
vectorize: "DummyVecEnv"
policy: "Basic_Q_network"
representation: "Basic_CNN"
learner: "DDQN_Learner"
runner: "DRL"
# the following three arguments are for "Basic_CNN" representation.
filters: [16, 16, 32] # [16, 16, 32, 32]
kernels: [8, 4, 3] # [8, 6, 4, 4]
strides: [4, 2, 1] # [2, 2, 2, 2]
q_hidden_size: [512,]
activation: 'relu'
seed: 1
parallels: 2
buffer_size: 20000
batch_size: 32
learning_rate: 0.0001
gamma: 0.99
start_greedy: 0.5
end_greedy: 0.01
decay_step_greedy: 500000
sync_frequency: 500
training_frequency: 1
running_steps: 2000000
start_training: 1000
use_grad_clip: False # gradient normalization
grad_clip_norm: 0.5
use_actions_mask: False
use_obsnorm: False
use_rewnorm: False
obsnorm_range: 5
rewnorm_range: 5
test_steps: 10000
eval_interval: 100000
test_episode: 1
log_dir: "./logs/ddqn/"
model_dir: "./models/ddqn/"
agent: "DDQN"
env_name: "Box2D"
env_id: "LunarLander-v2"
env_seed: 1
vectorize: "DummyVecEnv"
policy: "Basic_Q_network"
representation: "Basic_MLP"
learner: "DDQN_Learner"
runner: "DRL"
representation_hidden_size: [128,]
q_hidden_size: [128,]
activation: 'relu'
seed: 1
parallels: 10
buffer_size: 100000
batch_size: 256
learning_rate: 0.001
gamma: 0.99
start_greedy: 0.5
end_greedy: 0.01
decay_step_greedy: 100000
sync_frequency: 50
training_frequency: 1
running_steps: 300000
start_training: 1000
use_grad_clip: False # gradient normalization
grad_clip_norm: 0.5
use_actions_mask: False
use_obsnorm: False
use_rewnorm: False
obsnorm_range: 5
rewnorm_range: 5
test_steps: 10000
eval_interval: 50000
test_episode: 1
log_dir: "./logs/ddqn/"
model_dir: "./models/ddqn/"
agent: "DDQN"
vectorize: "Dummy_Atari"
env_name: "Atari"
env_id: "ALE/Breakout-v5"
env_seed: 1
obs_type: "grayscale" # choice for Atari env: ram, rgb, grayscale
img_size: [84, 84] # default is 210 x 160 in gym[Atari]
num_stack: 4 # frame stack trick
frame_skip: 4 # frame skip trick
noop_max: 30 # Do no-op action for a number of steps in [1, noop_max].
policy: "Basic_Q_network"
representation: "Basic_CNN"
learner: "DDQN_Learner"
runner: "DRL"
# the following three arguments are for "Basic_CNN" representation.
filters: [32, 64, 64] # [16, 16, 32, 32]
kernels: [8, 4, 3] # [8, 6, 4, 4]
strides: [4, 2, 1] # [2, 2, 2, 2]
q_hidden_size: [512, ]
activation: "relu" # The activation function of each hidden layer.
seed: 1069
parallels: 5
buffer_size: 500000
batch_size: 32
learning_rate: 0.0001
gamma: 0.99
start_greedy: 0.5
end_greedy: 0.05
decay_step_greedy: 10000000 # 10M
sync_frequency: 500
training_frequency: 1
running_steps: 50000000 # 50M
start_training: 10000
use_grad_clip: False # gradient normalization
grad_clip_norm: 0.5
use_actions_mask: False
use_obsnorm: False
use_rewnorm: False
obsnorm_range: 5
rewnorm_range: 5
test_steps: 10000
eval_interval: 500000
test_episode: 3
log_dir: "./logs/ddqn/"
model_dir: "./models/ddqn/"
agent: "Duel_DQN"
env_name: "Classic Control"
env_id: "CartPole-v1"
env_seed: 1
vectorize: "DummyVecEnv"
policy: "Duel_Q_network"
representation: "Basic_MLP"
learner: "DuelDQN_Learner"
runner: "DRL"
representation_hidden_size: [128, ]
q_hidden_size: [128, ]
activation: 'relu'
seed: 1
parallels: 10
buffer_size: 100000
batch_size: 256
learning_rate: 0.001
gamma: 0.99
start_greedy: 0.5
end_greedy: 0.01
decay_step_greedy: 200000
sync_frequency: 50
training_frequency: 1
running_steps: 500000
start_training: 1000
use_grad_clip: False # gradient normalization
grad_clip_norm: 0.5
use_actions_mask: False
use_obsnorm: False
use_rewnorm: False
obsnorm_range: 5
rewnorm_range: 5
test_steps: 10000
eval_interval: 50000
test_episode: 1
log_dir: "./logs/dueldqn/"
model_dir: "./models/dueldqn/"
agent: "Duel_DQN"
env_name: "Classic Control"
env_id: "Acrobot-v1"
env_seed: 1
vectorize: "DummyVecEnv"
policy: "Duel_Q_network"
representation: "Basic_MLP"
learner: "DuelDQN_Learner"
runner: "DRL"
representation_hidden_size: [128, ]
q_hidden_size: [128, ]
activation: 'relu'
seed: 1
parallels: 10
buffer_size: 100000
batch_size: 256
learning_rate: 0.001
gamma: 0.99
start_greedy: 0.5
end_greedy: 0.01
decay_step_greedy: 200000
sync_frequency: 50
training_frequency: 1
running_steps: 500000
start_training: 1000
use_grad_clip: False # gradient normalization
grad_clip_norm: 0.5
use_actions_mask: False
use_obsnorm: False
use_rewnorm: False
obsnorm_range: 5
rewnorm_range: 5
test_steps: 10000
eval_interval: 50000
test_episode: 1
log_dir: "./logs/dueldqn/"
model_dir: "./models/dueldqn/"
agent: "Duel_DQN"
env_name: "Classic Control"
env_id: "MountainCar-v0"
env_seed: 1
vectorize: "DummyVecEnv"
policy: "Duel_Q_network"
representation: "Basic_MLP"
learner: "DuelDQN_Learner"
runner: "DRL"
representation_hidden_size: [128, ]
q_hidden_size: [128, ]
activation: 'relu'
seed: 1
parallels: 10
buffer_size: 100000
batch_size: 256
learning_rate: 0.0001
gamma: 0.99
start_greedy: 0.5
end_greedy: 0.01
decay_step_greedy: 200000
sync_frequency: 50
training_frequency: 1
running_steps: 300000
start_training: 1000
use_grad_clip: False # gradient normalization
grad_clip_norm: 0.5
use_actions_mask: False
use_obsnorm: False
use_rewnorm: False
obsnorm_range: 5
rewnorm_range: 5
test_steps: 10000
eval_interval: 50000
test_episode: 1
log_dir: "./logs/dueldqn/"
model_dir: "./models/dueldqn/"
agent: "Duel_DQN"
env_name: "Box2D"
env_id: "CarRacing-v2"
env_seed: 1
vectorize: "DummyVecEnv"
policy: "Duel_Q_network"
representation: "Basic_CNN"
learner: "DuelDQN_Learner"
runner: "DRL"
# the following three arguments are for "Basic_CNN" representation.
filters: [16, 16, 32] # [16, 16, 32, 32]
kernels: [8, 4, 3] # [8, 6, 4, 4]
strides: [4, 2, 1] # [2, 2, 2, 2]
q_hidden_size: [512,]
activation: 'relu'
seed: 1
parallels: 2
buffer_size: 20000
batch_size: 32
learning_rate: 0.0001
gamma: 0.99
start_greedy: 0.5
end_greedy: 0.01
decay_step_greedy: 500000
sync_frequency: 500
training_frequency: 1
running_steps: 2000000
start_training: 1000
use_grad_clip: False # gradient normalization
grad_clip_norm: 0.5
use_actions_mask: False
use_obsnorm: False
use_rewnorm: False
obsnorm_range: 5
rewnorm_range: 5
test_steps: 10000
eval_interval: 100000
test_episode: 1
log_dir: "./logs/dueldqn/"
model_dir: "./models/dueldqn/"
agent: "Duel_DQN"
env_name: "Box2D"
env_id: "LunarLander-v2"
env_seed: 1
vectorize: "DummyVecEnv"
policy: "Duel_Q_network"
representation: "Basic_MLP"
learner: "DuelDQN_Learner"
runner: "DRL"
representation_hidden_size: [128, ]
q_hidden_size: [128, ]
activation: 'relu'
seed: 1
parallels: 10
buffer_size: 100000
batch_size: 256
learning_rate: 0.001
gamma: 0.99
start_greedy: 0.5
end_greedy: 0.01
decay_step_greedy: 200000
sync_frequency: 50
training_frequency: 1
running_steps: 500000
start_training: 1000
use_grad_clip: False # gradient normalization
grad_clip_norm: 0.5
use_actions_mask: False
use_obsnorm: False
use_rewnorm: False
obsnorm_range: 5
rewnorm_range: 5
test_steps: 10000
eval_interval: 50000
test_episode: 1
log_dir: "./logs/dueldqn/"
model_dir: "./models/dueldqn/"
agent: "Duel_DQN"
vectorize: "Dummy_Atari"
env_name: "Atari"
env_id: "ALE/Breakout-v5"
env_seed: 1
obs_type: "grayscale" # choice for Atari env: ram, rgb, grayscale
img_size: [84, 84] # default is 210 x 160 in gym[Atari]
num_stack: 4 # frame stack trick
frame_skip: 4 # frame skip trick
noop_max: 30 # Do no-op action for a number of steps in [1, noop_max].
policy: "Duel_Q_network"
representation: "Basic_CNN"
learner: "DuelDQN_Learner"
runner: "DRL"
# the following three arguments are for "Basic_CNN" representation.
filters: [32, 64, 64] # [16, 16, 32, 32]
kernels: [8, 4, 3] # [8, 6, 4, 4]
strides: [4, 2, 1] # [2, 2, 2, 2]
q_hidden_size: [512, ]
activation: "relu" # The activation function of each hidden layer.
seed: 1069
parallels: 5
buffer_size: 500000
batch_size: 32 # 64
learning_rate: 0.0001
gamma: 0.99
start_greedy: 0.5
end_greedy: 0.05
decay_step_greedy: 10000000 # 10M
sync_frequency: 500
training_frequency: 1
running_steps: 50000000 # 50M
start_training: 10000
use_grad_clip: False # gradient normalization
grad_clip_norm: 0.5
use_actions_mask: False
use_obsnorm: False
use_rewnorm: False
obsnorm_range: 5
rewnorm_range: 5
test_steps: 10000
save_model_frequency: 500000
test_episode: 1
log_dir: "./logs/dueldqn/"
model_dir: "./models/dueldqn/"
agent: "NoisyDQN"
env_name: "Classic Control"
env_id: "CartPole-v1"
env_seed: 1
vectorize: "DummyVecEnv"
learner: "DQN_Learner"
policy: "Noisy_Q_network"
representation: "Basic_MLP"
runner: "DRL"
representation_hidden_size: [128,]
q_hidden_size: [128,]
activation: 'relu'
seed: 1
parallels: 10
buffer_size: 200000
batch_size: 128
learning_rate: 0.001
gamma: 0.99
start_noise: 0.05
end_noise: 0.0
decay_step_noise: 500000
sync_frequency: 100
training_frequency: 2
running_steps: 500000
start_training: 1000
use_grad_clip: False # gradient normalization
grad_clip_norm: 0.5
use_actions_mask: False
use_obsnorm: False
use_rewnorm: False
obsnorm_range: 5
rewnorm_range: 5
test_steps: 10000
eval_interval: 50000
test_episode: 1
log_dir: "./logs/noisy_dqn/"
model_dir: "./models/noisy_dqn/"
agent: "NoisyDQN"
env_name: "Classic Control"
env_id: "Acrobot-v1"
env_seed: 1
vectorize: "DummyVecEnv"
learner: "DQN_Learner"
policy: "Noisy_Q_network"
representation: "Basic_MLP"
runner: "DRL"
representation_hidden_size: [128,]
q_hidden_size: [128,]
activation: 'relu'
seed: 1
parallels: 10
buffer_size: 200000
batch_size: 128
learning_rate: 0.001
gamma: 0.99
start_noise: 0.05
end_noise: 0.0
decay_step_noise: 500000
sync_frequency: 100
training_frequency: 2
running_steps: 500000
start_training: 1000
use_obsnorm: False
use_rewnorm: False
obsnorm_range: 5
rewnorm_range: 5
use_grad_clip: False # gradient normalization
grad_clip_norm: 0.5
use_actions_mask: False
test_steps: 10000
eval_interval: 50000
test_episode: 1
log_dir: "./logs/noisy_dqn/"
model_dir: "./models/noisy_dqn/"
agent: "NoisyDQN"
env_name: "Classic Control"
env_id: "MountainCar-v0"
env_seed: 1
vectorize: "DummyVecEnv"
learner: "DQN_Learner"
policy: "Noisy_Q_network"
representation: "Basic_MLP"
runner: "DRL"
representation_hidden_size: [128,]
q_hidden_size: [128,]
activation: 'relu'
seed: 1
parallels: 10
buffer_size: 200000
batch_size: 128
learning_rate: 0.001
gamma: 0.99
start_noise: 0.05
end_noise: 0.0
decay_step_noise: 500000
sync_frequency: 100
training_frequency: 2
running_steps: 500000
start_training: 1000
use_grad_clip: False # gradient normalization
grad_clip_norm: 0.5
use_actions_mask: False
use_obsnorm: False
use_rewnorm: False
obsnorm_range: 5
rewnorm_range: 5
test_steps: 10000
eval_interval: 50000
test_episode: 1
log_dir: "./logs/noisy_dqn/"
model_dir: "./models/noisy_dqn/"
agent: "NoisyDQN"
env_name: "Box2D"
env_id: "CarRacing-v2"
env_seed: 1
vectorize: "DummyVecEnv"
learner: "DQN_Learner"
policy: "Noisy_Q_network"
representation: "Basic_CNN"
runner: "DRL"
# the following three arguments are for "Basic_CNN" representation.
filters: [16, 16, 32] # [16, 16, 32, 32]
kernels: [8, 4, 3] # [8, 6, 4, 4]
strides: [4, 2, 1] # [2, 2, 2, 2]
q_hidden_size: [512,]
activation: 'relu'
seed: 1
parallels: 2
buffer_size: 20000
batch_size: 32
learning_rate: 0.0001
gamma: 0.99
start_noise: 0.05
end_noise: 0.0
decay_step_noise: 2000000
sync_frequency: 500
training_frequency: 1
running_steps: 2000000
start_training: 1000
use_grad_clip: False # gradient normalization
grad_clip_norm: 0.5
use_actions_mask: False
use_obsnorm: False
use_rewnorm: False
obsnorm_range: 5
rewnorm_range: 5
test_steps: 10000
eval_interval: 100000
test_episode: 1
log_dir: "./logs/noisy_dqn/"
model_dir: "./models/noisy_dqn/"
agent: "NoisyDQN"
env_name: "Box2D"
env_id: "LunarLander-v2"
env_seed: 1
vectorize: "DummyVecEnv"
learner: "DQN_Learner"
policy: "Noisy_Q_network"
representation: "Basic_MLP"
runner: "DRL"
representation_hidden_size: [128,]
q_hidden_size: [128,]
activation: 'relu'
seed: 1
parallels: 10
buffer_size: 200000
batch_size: 128
learning_rate: 0.001
gamma: 0.99
start_noise: 0.05
end_noise: 0.0
decay_step_noise: 500000
sync_frequency: 100
training_frequency: 2
running_steps: 500000
start_training: 1000
use_grad_clip: False # gradient normalization
grad_clip_norm: 0.5
use_actions_mask: False
use_obsnorm: False
use_rewnorm: False
obsnorm_range: 5
rewnorm_range: 5
test_steps: 10000
eval_interval: 50000
test_episode: 1
log_dir: "./logs/noisy_dqn/"
model_dir: "./models/noisy_dqn/"
agent: "NoisyDQN"
vectorize: "Dummy_Atari"
env_name: "Atari"
env_id: "ALE/Breakout-v5"
env_seed: 1
obs_type: "grayscale" # choice for Atari env: ram, rgb, grayscale
img_size: [84, 84] # default is 210 x 160 in gym[Atari]
num_stack: 4 # frame stack trick
frame_skip: 4 # frame skip trick
noop_max: 30 # Do no-op action for a number of steps in [1, noop_max].
learner: "DQN_Learner"
policy: "Noisy_Q_network"
representation: "Basic_CNN"
runner: "DRL"
# the following three arguments are for "Basic_CNN" representation.
filters: [32, 64, 64] # [16, 16, 32, 32]
kernels: [8, 4, 3] # [8, 6, 4, 4]
strides: [4, 2, 1] # [2, 2, 2, 2]
q_hidden_size: [512, ]
activation: "relu" # The activation function of each hidden layer.
seed: 1069
parallels: 5
buffer_size: 500000
batch_size: 32 # 64
learning_rate: 0.0001
gamma: 0.99
start_noise: 0.05
end_noise: 0.0
decay_step_greedy: 10000000 # 10M
sync_frequency: 500
training_frequency: 1
running_steps: 50000000 # 50M
start_training: 10000
use_grad_clip: False # gradient normalization
grad_clip_norm: 0.5
use_actions_mask: False
use_obsnorm: False
use_rewnorm: False
obsnorm_range: 5
rewnorm_range: 5
test_steps: 10000
eval_interval: 500000
test_episode: 1
log_dir: "./logs/noisy_dqn/"
model_dir: "./models/noisy_dqn/"
agent: "PerDQN"
env_name: "Classic Control"
env_id: "CartPole-v1"
env_seed: 1
vectorize: "DummyVecEnv"
learner: "PerDQN_Learner"
policy: "Basic_Q_network"
representation: "Basic_MLP"
runner: "DRL"
representation_hidden_size: [128,]
q_hidden_size: [128,]
activation: 'relu'
seed: 1
parallels: 10
buffer_size: 200000
batch_size: 128
learning_rate: 0.001
gamma: 0.99
start_greedy: 0.5
end_greedy: 0.1
decay_step_greedy: 200000
sync_frequency: 100
training_frequency: 4
running_steps: 500000
start_training: 1000
use_grad_clip: False # gradient normalization
grad_clip_norm: 0.5
use_actions_mask: False
use_obsnorm: False
use_rewnorm: False
obsnorm_range: 5
rewnorm_range: 5
PER_alpha: 0.5
PER_beta0: 0.4
test_steps: 10000
eval_interval: 50000
test_episode: 1
log_dir: "./logs/perdqn/"
model_dir: "./models/perdqn/"
agent: "PerDQN"
env_name: "Classic Control"
env_id: "Acrobot-v1"
env_seed: 1
vectorize: "DummyVecEnv"
learner: "PerDQN_Learner"
policy: "Basic_Q_network"
representation: "Basic_MLP"
runner: "DRL"
representation_hidden_size: [128,]
q_hidden_size: [128,]
activation: 'relu'
seed: 1
parallels: 10
buffer_size: 200000
batch_size: 128
learning_rate: 0.001
gamma: 0.99
start_greedy: 0.5
end_greedy: 0.1
decay_step_greedy: 200000
sync_frequency: 100
training_frequency: 4
running_steps: 500000
start_training: 1000
use_grad_clip: False # gradient normalization
grad_clip_norm: 0.5
use_actions_mask: False
use_obsnorm: False
use_rewnorm: False
obsnorm_range: 5
rewnorm_range: 5
PER_alpha: 0.5
PER_beta0: 0.4
test_steps: 10000
eval_interval: 50000
test_episode: 1
log_dir: "./logs/perdqn/"
model_dir: "./models/perdqn/"
agent: "PerDQN"
env_name: "Classic Control"
env_id: "MountainCar-v0"
env_seed: 1
vectorize: "DummyVecEnv"
learner: "PerDQN_Learner"
policy: "Basic_Q_network"
representation: "Basic_MLP"
runner: "DRL"
representation_hidden_size: [128,]
q_hidden_size: [128,]
activation: 'relu'
seed: 1
parallels: 10
buffer_size: 200000
batch_size: 128
learning_rate: 0.001
gamma: 0.99
start_greedy: 0.5
end_greedy: 0.1
decay_step_greedy: 200000
sync_frequency: 100
training_frequency: 4
running_steps: 500000
start_training: 1000
use_grad_clip: False # gradient normalization
grad_clip_norm: 0.5
use_actions_mask: False
use_obsnorm: False
use_rewnorm: False
obsnorm_range: 5
rewnorm_range: 5
PER_alpha: 0.5
PER_beta0: 0.4
test_steps: 10000
eval_interval: 50000
test_episode: 1
log_dir: "./logs/perdqn/"
model_dir: "./models/perdqn/"
agent: "PerDQN"
env_name: "Box2D"
env_id: "CarRacing-v2"
env_seed: 1
vectorize: "DummyVecEnv"
learner: "PerDQN_Learner"
policy: "Basic_Q_network"
representation: "Basic_CNN"
runner: "DRL"
# the following three arguments are for "Basic_CNN" representation.
filters: [16, 16, 32] # [16, 16, 32, 32]
kernels: [8, 4, 3] # [8, 6, 4, 4]
strides: [4, 2, 1] # [2, 2, 2, 2]
q_hidden_size: [512,]
activation: 'relu'
seed: 1
parallels: 2
buffer_size: 20000
batch_size: 32
learning_rate: 0.0001
gamma: 0.99
start_greedy: 0.5
end_greedy: 0.01
decay_step_greedy: 500000
sync_frequency: 500
training_frequency: 1
running_steps: 2000000
start_training: 1000
use_grad_clip: False # gradient normalization
grad_clip_norm: 0.5
use_actions_mask: False
use_obsnorm: False
use_rewnorm: False
obsnorm_range: 5
rewnorm_range: 5
PER_alpha: 0.5
PER_beta0: 0.4
test_steps: 10000
eval_interval: 100000
test_episode: 1
log_dir: "./logs/perdqn/"
model_dir: "./models/perdqn/"
agent: "PerDQN"
env_name: "Classic Control"
env_id: "LunarLander-v2"
env_seed: 1
vectorize: "DummyVecEnv"
learner: "PerDQN_Learner"
policy: "Basic_Q_network"
representation: "Basic_MLP"
runner: "DRL"
representation_hidden_size: [128,]
q_hidden_size: [128,]
activation: 'relu'
seed: 1
parallels: 10
buffer_size: 200000
batch_size: 128
learning_rate: 0.001
gamma: 0.99
start_greedy: 0.5
end_greedy: 0.1
decay_step_greedy: 200000
sync_frequency: 100
training_frequency: 4
running_steps: 500000
start_training: 1000
use_grad_clip: False # gradient normalization
grad_clip_norm: 0.5
use_actions_mask: False
use_obsnorm: False
use_rewnorm: False
obsnorm_range: 5
rewnorm_range: 5
PER_alpha: 0.5
PER_beta0: 0.4
test_steps: 10000
eval_interval: 50000
test_episode: 1
log_dir: "./logs/perdqn/"
model_dir: "./models/perdqn/"
agent: "PerDQN"
vectorize: "Dummy_Atari"
env_name: "Atari"
env_id: "ALE/Breakout-v5"
env_seed: 1
obs_type: "grayscale" # choice for Atari env: ram, rgb, grayscale
img_size: [84, 84] # default is 210 x 160 in gym[Atari]
num_stack: 4 # frame stack trick
frame_skip: 4 # frame skip trick
noop_max: 30 # Do no-op action for a number of steps in [1, noop_max].
learner: "PerDQN_Learner"
policy: "Basic_Q_network"
representation: "Basic_CNN"
runner: "DRL"
# the following three arguments are for "Basic_CNN" representation.
filters: [32, 64, 64] # [16, 16, 32, 32]
kernels: [8, 4, 3] # [8, 6, 4, 4]
strides: [4, 2, 1] # [2, 2, 2, 2]
q_hidden_size: [512, ]
activation: "relu" # The activation function of each hidden layer.
seed: 1069
parallels: 5
buffer_size: 500000
batch_size: 32 # 64
learning_rate: 0.0001
gamma: 0.99
start_greedy: 0.5
end_greedy: 0.05
decay_step_greedy: 10000000 # 10M
sync_frequency: 500
training_frequency: 1
running_steps: 50000000 # 50M
start_training: 10000
use_grad_clip: False # gradient normalization
grad_clip_norm: 0.5
use_actions_mask: False
use_obsnorm: False
use_rewnorm: False
obsnorm_range: 5
rewnorm_range: 5
PER_alpha: 0.5
PER_beta0: 0.4
test_steps: 10000
eval_interval: 500000
test_episode: 1
log_dir: "./logs/perdqn/"
model_dir: "./models/perdqn/"
agent: "QRDQN"
env_name: "Classic Control"
env_id: "CartPole-v1"
env_seed: 1
vectorize: "DummyVecEnv"
learner: "QRDQN_Learner"
policy: "QR_Q_network"
representation: "Basic_MLP"
runner: "DRL"
representation_hidden_size: [128,]
q_hidden_size: [128,]
activation: 'relu'
seed: 1
parallels: 10
buffer_size: 200000
batch_size: 256
learning_rate: 0.001
gamma: 0.99
quantile_num: 20
start_greedy: 0.25
end_greedy: 0.01
decay_step_greedy: 300000
sync_frequency: 100
training_frequency: 1
running_steps: 300000
start_training: 1000
use_grad_clip: False # gradient normalization
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
grad_clip_norm: 0.5
use_actions_mask: False
use_obsnorm: False
use_rewnorm: False
obsnorm_range: 5
rewnorm_range: 5
test_steps: 10000
eval_interval: 50000
test_episode: 1
log_dir: "./logs/qrdqn/"
model_dir: "./models/qrdqn/"
agent: "QRDQN"
env_name: "Classic Control"
env_id: "Acrobot-v1"
env_seed: 1
vectorize: "DummyVecEnv"
learner: "QRDQN_Learner"
policy: "QR_Q_network"
representation: "Basic_MLP"
runner: "DRL"
representation_hidden_size: [128,]
q_hidden_size: [128,]
activation: 'relu'
seed: 1
parallels: 10
buffer_size: 200000
batch_size: 256
learning_rate: 0.001
gamma: 0.99
quantile_num: 20
start_greedy: 0.25
end_greedy: 0.01
decay_step_greedy: 300000
sync_frequency: 100
training_frequency: 1
running_steps: 300000
start_training: 1000
use_grad_clip: False # gradient normalization
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
grad_clip_norm: 0.5
use_actions_mask: False
use_obsnorm: False
use_rewnorm: False
obsnorm_range: 5
rewnorm_range: 5
test_steps: 10000
eval_interval: 50000
test_episode: 1
log_dir: "./logs/qrdqn/"
model_dir: "./models/qrdqn/"
agent: "QRDQN"
env_name: "Classic Control"
env_id: "MountainCar-v0"
env_seed: 1
vectorize: "DummyVecEnv"
learner: "QRDQN_Learner"
policy: "QR_Q_network"
representation: "Basic_MLP"
runner: "DRL"
representation_hidden_size: [128,]
q_hidden_size: [128,]
activation: 'relu'
seed: 1
parallels: 10
buffer_size: 200000
batch_size: 256
learning_rate: 0.001
gamma: 0.99
quantile_num: 20
start_greedy: 0.25
end_greedy: 0.01
decay_step_greedy: 300000
sync_frequency: 100
training_frequency: 1
running_steps: 300000
start_training: 1000
use_grad_clip: False # gradient normalization
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
grad_clip_norm: 0.5
use_actions_mask: False
use_obsnorm: False
use_rewnorm: False
obsnorm_range: 5
rewnorm_range: 5
test_steps: 10000
eval_interval: 50000
test_episode: 1
log_dir: "./logs/qrdqn/"
model_dir: "./models/qrdqn/"
agent: "QRDQN"
env_name: "Box2D"
env_id: "CarRacing-v2"
env_seed: 1
vectorize: "DummyVecEnv"
learner: "QRDQN_Learner"
policy: "QR_Q_network"
representation: "Basic_CNN"
runner: "DRL"
# the following three arguments are for "Basic_CNN" representation.
filters: [16, 16, 32] # [16, 16, 32, 32]
kernels: [8, 4, 3] # [8, 6, 4, 4]
strides: [4, 2, 1] # [2, 2, 2, 2]
q_hidden_size: [512,]
activation: 'relu'
seed: 1
parallels: 2
buffer_size: 20000
batch_size: 32
learning_rate: 0.0001
gamma: 0.99
quantile_num: 20
start_greedy: 0.5
end_greedy: 0.01
decay_step_greedy: 500000
sync_frequency: 500
training_frequency: 1
running_steps: 2000000
start_training: 1000
use_grad_clip: False # gradient normalization
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
grad_clip_norm: 0.5
use_actions_mask: False
use_obsnorm: False
use_rewnorm: False
obsnorm_range: 5
rewnorm_range: 5
test_steps: 10000
eval_interval: 100000
test_episode: 1
log_dir: "./logs/qrdqn/"
model_dir: "./models/qrdqn/"
agent: "QRDQN"
env_name: "Box2D"
env_id: "LunarLander-v2"
env_seed: 1
vectorize: "DummyVecEnv"
learner: "QRDQN_Learner"
policy: "QR_Q_network"
representation: "Basic_MLP"
runner: "DRL"
representation_hidden_size: [128,]
q_hidden_size: [128,]
activation: 'relu'
seed: 1
parallels: 10
buffer_size: 100000
batch_size: 256
learning_rate: 0.001
gamma: 0.99
quantile_num: 20
start_greedy: 0.5
end_greedy: 0.01
decay_step_greedy: 100000
sync_frequency: 50
training_frequency: 1
running_steps: 200000
start_training: 1000
use_grad_clip: False # gradient normalization
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
grad_clip_norm: 0.5
use_actions_mask: False
use_obsnorm: False
use_rewnorm: False
obsnorm_range: 5
rewnorm_range: 5
test_steps: 10000
eval_interval: 50000
test_episode: 1
log_dir: "./logs/qrdqn/"
model_dir: "./models/qrdqn/"
agent: "QRDQN"
vectorize: "Dummy_Atari"
env_name: "Atari"
env_id: "ALE/Breakout-v5"
env_seed: 1
obs_type: "grayscale" # choice for Atari env: ram, rgb, grayscale
img_size: [84, 84] # default is 210 x 160 in gym[Atari]
num_stack: 4 # frame stack trick
frame_skip: 4 # frame skip trick
noop_max: 30 # Do no-op action for a number of steps in [1, noop_max].
learner: "QRDQN_Learner"
policy: "QR_Q_network"
representation: "Basic_CNN"
runner: "DRL"
# the following three arguments are for "Basic_CNN" representation.
filters: [32, 64, 64] # [16, 16, 32, 32]
kernels: [8, 4, 3] # [8, 6, 4, 4]
strides: [4, 2, 1] # [2, 2, 2, 2]
q_hidden_size: [512, ]
activation: "relu" # The activation function of each hidden layer.
seed: 1069
parallels: 5
buffer_size: 500000
batch_size: 32 # 64
learning_rate: 0.0001
gamma: 0.99
quantile_num: 20
start_greedy: 0.5
end_greedy: 0.05
decay_step_greedy: 10000000 # 10M
sync_frequency: 500
training_frequency: 1
running_steps: 50000000 # 50M
start_training: 10000
use_grad_clip: False # gradient normalization
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
grad_clip_norm: 0.5
use_actions_mask: False
use_obsnorm: False
use_rewnorm: False
obsnorm_range: 5
rewnorm_range: 5
test_steps: 10000
eval_interval: 500000
test_episode: 1
log_dir: "./logs/qrdqn/"
model_dir: "./models/qrdqn/"
Policy-based Algorithms¶
agent: "PG"
env_name: "Classic Control"
env_id: "CartPole-v1"
env_seed: 1
representation: "Basic_MLP"
vectorize: "DummyVecEnv"
policy: "Categorical_Actor"
learner: "PG_Learner"
runner: "DRL"
representation_hidden_size: [128,]
actor_hidden_size: [128,]
activation: 'relu'
activation_action: 'tanh'
seed: 1
parallels: 10
running_steps: 300000
horizon_size: 128 # the horizon size for an environment, buffer_size = horizon_size * parallels.
n_epochs: 1
n_minibatch: 1
learning_rate: 0.0004
ent_coef: 0.01
gamma: 0.98
use_gae: False
gae_lambda: 0.95
use_advnorm: False
use_grad_clip: True # gradient normalization
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
grad_clip_norm: 0.5
use_actions_mask: False
use_obsnorm: True
use_rewnorm: True
obsnorm_range: 5
rewnorm_range: 5
test_steps: 10000
eval_interval: 50000
test_episode: 1
log_dir: "./logs/pg/"
model_dir: "./models/pg/"
agent: "PG"
env_name: "Classic Control"
env_id: "Acrobot-v1"
env_seed: 1
representation: "Basic_MLP"
vectorize: "DummyVecEnv"
policy: "Categorical_Actor"
learner: "PG_Learner"
runner: "DRL"
representation_hidden_size: [128,]
actor_hidden_size: [128,]
activation: 'relu'
activation_action: 'tanh'
seed: 1
parallels: 10
running_steps: 300000
horizon_size: 500
n_epochs: 1
n_minibatch: 1
learning_rate: 0.0004
ent_coef: 0.01
gamma: 0.98
use_gae: False
gae_lambda: 0.95
use_advnorm: False
use_grad_clip: True # gradient normalization
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
grad_clip_norm: 0.5
use_actions_mask: False
use_obsnorm: True
use_rewnorm: True
obsnorm_range: 5
rewnorm_range: 5
test_steps: 10000
eval_interval: 50000
test_episode: 1
log_dir: "./logs/pg/"
model_dir: "./models/pg/"
agent: "PG"
env_name: "Classic Control"
env_id: "Pendulum-v1"
env_seed: 1
representation: "Basic_MLP"
vectorize: "DummyVecEnv"
policy: "Gaussian_Actor"
learner: "PG_Learner"
runner: "DRL"
representation_hidden_size: [256,]
actor_hidden_size: [256,]
activation: 'leaky_relu'
activation_action: 'tanh'
seed: 1
parallels: 10
running_steps: 300000
horizon_size: 128 # the horizon size for an environment, buffer_size = horizon_size * parallels.
n_epochs: 1
n_minibatch: 1
learning_rate: 0.0004
ent_coef: 0.01
gamma: 0.98
use_gae: False
gae_lambda: 0.95
use_advnorm: False
use_grad_clip: True # gradient normalization
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
grad_clip_norm: 0.5
use_actions_mask: False
use_obsnorm: True
use_rewnorm: True
obsnorm_range: 5
rewnorm_range: 5
test_steps: 10000
eval_interval: 50000
test_episode: 1
log_dir: "./logs/pg/"
model_dir: "./models/pg/"
agent: "PG"
env_name: "Box2D"
env_id: "BipedalWalker-v3"
env_seed: 1
representation: "Basic_MLP"
vectorize: "DummyVecEnv"
policy: "Gaussian_Actor"
learner: "PG_Learner"
runner: "DRL"
representation_hidden_size: [128,]
actor_hidden_size: [128,]
activation: 'relu'
activation_action: 'tanh'
seed: 1
parallels: 10
running_steps: 100000
horizon_size: 1024 # the horizon size for an environment, buffer_size = horizon_size * parallels.
n_epochs: 3
n_minibatch: 1
learning_rate: 0.0004
ent_coef: 0.01
gamma: 0.98
use_gae: False
gae_lambda: 0.95
use_advnorm: False
use_grad_clip: True # gradient normalization
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
grad_clip_norm: 0.5
use_actions_mask: False
use_obsnorm: True
use_rewnorm: True
obsnorm_range: 5
rewnorm_range: 5
test_steps: 10000
eval_interval: 10000
test_episode: 1
log_dir: "./logs/pg/"
model_dir: "./models/pg/"
agent: "PG"
env_name: "Box2D"
env_id: "LunarLander-v2"
env_seed: 1
representation: "Basic_MLP"
vectorize: "DummyVecEnv"
policy: "Categorical_Actor"
learner: "PG_Learner"
runner: "DRL"
representation_hidden_size: [128,]
actor_hidden_size: [128,]
activation: 'relu'
activation_action: 'tanh'
seed: 1
parallels: 10
running_steps: 300000
horizon_size: 128 # the horizon size for an environment, buffer_size = horizon_size * parallels.
n_epochs: 3
n_minibatch: 1
learning_rate: 0.0004
ent_coef: 0.01
gamma: 0.98
use_gae: False
gae_lambda: 0.95
use_advnorm: False
use_grad_clip: True # gradient normalization
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
grad_clip_norm: 0.5
use_actions_mask: False
use_obsnorm: True
use_rewnorm: True
obsnorm_range: 5
rewnorm_range: 5
test_steps: 10000
eval_interval: 50000
test_episode: 1
log_dir: "./logs/pg/"
model_dir: "./models/pg/"
agent: "PG"
env_name: "MuJoCo"
env_id: "Ant-v4"
env_seed: 1
vectorize: "DummyVecEnv"
policy: "Gaussian_Actor"
representation: "Basic_MLP"
learner: "PG_Learner"
runner: "DRL"
representation_hidden_size: [256, 256]
actor_hidden_size: []
activation: "leaky_relu"
activation_action: 'tanh'
seed: 1
parallels: 16
running_steps: 1000000 # 1M
horizon_size: 256 # the horizon size for an environment, buffer_size = horizon_size * parallels.
n_epochs: 1
n_minibatch: 1
learning_rate: 0.0007 # 7e-4
ent_coef: 0.0
gamma: 0.99
use_gae: False
gae_lambda: 0.95
use_advnorm: False
use_grad_clip: True # gradient normalization
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
grad_clip_norm: 0.5
use_actions_mask: False
use_obsnorm: True
use_rewnorm: True
obsnorm_range: 5
rewnorm_range: 5
test_steps: 10000
eval_interval: 5000
test_episode: 5
log_dir: "./logs/pg/"
model_dir: "./models/pg/"
agent: "PPG"
env_name: "Classic Control"
env_id: "CartPole-v1"
env_seed: 1
vectorize: "DummyVecEnv"
representation: "Basic_MLP"
policy: "Categorical_PPG"
learner: "PPG_Learner"
runner: "DRL"
representation_hidden_size: [128,]
actor_hidden_size: [128,]
critic_hidden_size: [128,]
activation: "relu" # The activation function of each hidden layer.
seed: 1
parallels: 10
running_steps: 300000
horizon_size: 256 # the horizon size for an environment, buffer_size = horizon_size * parallels.
n_epochs: 1
policy_nepoch: 4
value_nepoch: 8
aux_nepoch: 8
n_minibatch: 1
learning_rate: 0.0004
ent_coef: 0.01
clip_range: 0.2
kl_beta: 1.0
gamma: 0.98
use_gae: True
gae_lambda: 0.95
use_advnorm: True
use_grad_clip: False # gradient normalization
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
grad_clip_norm: 0.5
use_actions_mask: False
use_obsnorm: True
use_rewnorm: True
obsnorm_range: 5
rewnorm_range: 5
test_steps: 10000
eval_interval: 50000
test_episode: 1
log_dir: "./logs/ppg/"
model_dir: "./models/ppg/"
agent: "PPG"
env_name: "Classic Control"
env_id: "Acrobot-v1"
env_seed: 1
vectorize: "DummyVecEnv"
representation: "Basic_MLP"
policy: "Categorical_PPG"
learner: "PPG_Learner"
runner: "DRL"
representation_hidden_size: [128,]
actor_hidden_size: [128,]
critic_hidden_size: [128,]
activation: "leaky_relu"
seed: 1
parallels: 10
running_steps: 300000
horizon_size: 256 # the horizon size for an environment, buffer_size = horizon_size * parallels.
n_epochs: 1
policy_nepoch: 4
value_nepoch: 8
aux_nepoch: 8
n_minibatch: 1
learning_rate: 0.001
ent_coef: 0.01
clip_range: 0.2
kl_beta: 1.0
gamma: 0.98
use_gae: True
gae_lambda: 0.95
use_advnorm: True
use_grad_clip: False # gradient normalization
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
grad_clip_norm: 0.5
use_actions_mask: False
use_obsnorm: True
use_rewnorm: True
obsnorm_range: 5
rewnorm_range: 5
test_steps: 10000
eval_interval: 50000
test_episode: 1
log_dir: "./logs/ppg/"
model_dir: "./models/ppg/"
agent: "PPG"
env_name: "Classic Control"
env_id: "Pendulum-v1"
env_seed: 1
vectorize: "DummyVecEnv"
representation: "Basic_MLP"
policy: "Gaussian_PPG"
learner: "PPG_Learner"
runner: "DRL"
representation_hidden_size: [128,]
actor_hidden_size: [128,]
critic_hidden_size: [128,]
activation: "leaky_relu"
activation_action: 'tanh'
seed: 1
parallels: 10
running_steps: 300000
horizon_size: 256 # the horizon size for an environment, buffer_size = horizon_size * parallels.
n_epochs: 1
policy_nepoch: 4
value_nepoch: 8
aux_nepoch: 8
n_minibatch: 1
learning_rate: 0.001
ent_coef: 0.01
clip_range: 0.2
kl_beta: 1.0
gamma: 0.98
use_gae: True
gae_lambda: 0.95
use_advnorm: True
use_grad_clip: False # gradient normalization
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
grad_clip_norm: 0.5
use_actions_mask: False
use_obsnorm: True
use_rewnorm: True
obsnorm_range: 5
rewnorm_range: 5
test_steps: 10000
eval_interval: 50000
test_episode: 1
log_dir: "./logs/ppg/"
model_dir: "./models/ppg/"
agent: "PPG"
env_name: "Classic Control"
env_id: "MountainCar-v0"
env_seed: 1
vectorize: "DummyVecEnv"
representation: "Basic_MLP"
policy: "Categorical_PPG"
learner: "PPG_Learner"
runner: "DRL"
representation_hidden_size: [128,]
actor_hidden_size: [128,]
critic_hidden_size: [128,]
activation: "leaky_relu"
seed: 1
parallels: 10
running_steps: 300000
horizon_size: 256 # the horizon size for an environment, buffer_size = horizon_size * parallels.
n_epochs: 1
policy_nepoch: 4
value_nepoch: 8
aux_nepoch: 8
n_minibatch: 1
learning_rate: 0.001
ent_coef: 0.01
clip_range: 0.2
kl_beta: 1.0
gamma: 0.98
use_gae: True
gae_lambda: 0.95
use_advnorm: True
use_grad_clip: False # gradient normalization
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
grad_clip_norm: 0.5
use_actions_mask: False
use_obsnorm: True
use_rewnorm: True
obsnorm_range: 5
rewnorm_range: 5
test_steps: 10000
eval_interval: 50000
test_episode: 1
log_dir: "./logs/ppg/"
model_dir: "./models/ppg/"
agent: "PPG"
env_name: "Box2D"
env_id: "BipedalWalker-v3"
env_seed: 1
vectorize: "DummyVecEnv"
representation: "Basic_MLP"
policy: "Gaussian_PPG"
learner: "PPG_Learner"
runner: "DRL"
representation_hidden_size: [128,]
actor_hidden_size: [128,]
critic_hidden_size: [128,]
activation: "leaky_relu"
activation_action: 'tanh'
seed: 1
parallels: 10
running_steps: 300000
horizon_size: 256 # the horizon size for an environment, buffer_size = horizon_size * parallels.
n_epochs: 1
policy_nepoch: 4
value_nepoch: 8
aux_nepoch: 8
n_minibatch: 1
learning_rate: 0.001
ent_coef: 0.01
clip_range: 0.2
kl_beta: 1.0
gamma: 0.98
use_gae: True
gae_lambda: 0.95
use_advnorm: True
use_grad_clip: False # gradient normalization
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
grad_clip_norm: 0.5
use_actions_mask: False
use_obsnorm: True
use_rewnorm: True
obsnorm_range: 5
rewnorm_range: 5
test_steps: 10000
eval_interval: 50000
test_episode: 1
log_dir: "./logs/ppg/"
model_dir: "./models/ppg/"
agent: "PPG"
env_name: "Box2D"
env_id: "LunarLander-v2"
env_seed: 1
vectorize: "DummyVecEnv"
representation: "Basic_MLP"
policy: "Categorical_PPG"
learner: "PPG_Learner"
runner: "DRL"
representation_hidden_size: [128,]
actor_hidden_size: [128,]
critic_hidden_size: [128,]
activation: "relu" # The activation function of each hidden layer.
seed: 1
parallels: 10
running_steps: 300000
horizon_size: 256 # the horizon size for an environment, buffer_size = horizon_size * parallels.
n_epochs: 1
policy_nepoch: 4
value_nepoch: 8
aux_nepoch: 8
n_minibatch: 1
learning_rate: 0.0004
ent_coef: 0.01
clip_range: 0.2
kl_beta: 1.0
gamma: 0.98
use_gae: True
gae_lambda: 0.95
use_advnorm: True
use_grad_clip: False # gradient normalization
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
grad_clip_norm: 0.5
use_actions_mask: False
use_obsnorm: True
use_rewnorm: True
obsnorm_range: 5
rewnorm_range: 5
test_steps: 10000
eval_interval: 50000
test_episode: 1
log_dir: "./logs/ppg/"
model_dir: "./models/ppg/"
agent: "PPG"
env_name: "MuJoCo"
env_id: "InvertedPendulum-v2"
env_seed: 1
vectorize: "DummyVecEnv"
representation: "Basic_MLP"
policy: "Gaussian_PPG"
learner: "PPG_Learner"
runner: "DRL"
representation_hidden_size: [256,]
actor_hidden_size: [256,]
critic_hidden_size: [256,]
activation: "leaky_relu"
activation_action: 'tanh'
seed: 1
parallels: 16
running_steps: 1000000 # 1M
horizon_size: 256 # the horizon size for an environment, buffer_size = horizon_size * parallels.
n_minibatch: 4
n_epochs: 1
policy_nepoch: 2
value_nepoch: 4
aux_nepoch: 8
learning_rate: 0.0007
ent_coef: 0.0
clip_range: 0.25
kl_beta: 2.0
gamma: 0.98
use_gae: True
gae_lambda: 0.95
use_advnorm: True
use_grad_clip: False # gradient normalization
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
grad_clip_norm: 0.5
use_actions_mask: False
use_obsnorm: True
use_rewnorm: True
obsnorm_range: 5
rewnorm_range: 5
test_steps: 10000
eval_interval: 10000
test_episode: 5
log_dir: "./logs/ppg/"
model_dir: "./models/ppg/"
agent: "PPO_Clip" # Choice: PPO_Clip, PPO_KL
env_name: "Classic Control"
env_id: "CartPole-v1"
env_seed: 1
vectorize: "DummyVecEnv"
representation: "Basic_MLP"
policy: "Categorical_AC"
learner: "PPOCLIP_Learner" # Choice: PPOCLIP_Learner, PPOKL_Learner
runner: "DRL"
representation_hidden_size: [128,]
actor_hidden_size: [128,]
critic_hidden_size: [128,]
activation: 'leaky_relu'
seed: 1
parallels: 10
running_steps: 120000
horizon_size: 256 # the horizon size for an environment, buffer_size = horizon_size * parallels.
n_epochs: 8
n_minibatch: 8
learning_rate: 0.0004
vf_coef: 0.25
ent_coef: 0.01
target_kl: 0.25 # for PPO_KL agent
kl_coef: 1.0 # for PPO_KL agent
clip_range: 0.2 # for PPO_Clip agent
gamma: 0.98
use_gae: True
gae_lambda: 0.95
use_advnorm: True
use_grad_clip: True # gradient normalization
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
grad_clip_norm: 0.5
use_actions_mask: False
use_obsnorm: True
use_rewnorm: True
obsnorm_range: 5
rewnorm_range: 5
test_steps: 10000
eval_interval: 50000
test_episode: 3
log_dir: "./logs/ppo/"
model_dir: "./models/ppo/"
agent: "PPO_Clip" # Choice: PPO_Clip, PPO_KL
env_name: "Classic Control"
env_id: "Acrobot-v1"
env_seed: 1
vectorize: "DummyVecEnv"
representation: "Basic_MLP"
policy: "Categorical_AC"
learner: "PPOCLIP_Learner"
runner: "DRL"
representation_hidden_size: [128,]
actor_hidden_size: [128,]
critic_hidden_size: [128,]
activation: 'leaky_relu'
seed: 1
parallels: 10
running_steps: 300000
horizon_size: 256 # the horizon size for an environment, buffer_size = horizon_size * parallels.
n_epochs: 8
n_minibatch: 8
learning_rate: 0.0004
vf_coef: 0.25
ent_coef: 0.01
target_kl: 0.25 # for PPO_KL agent
kl_coef: 1.0 # for PPO_KL agent
clip_range: 0.2 # for PPO_Clip agent
gamma: 0.98
use_gae: True
gae_lambda: 0.95
use_advnorm: True
use_grad_clip: True # gradient normalization
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
grad_clip_norm: 0.5
use_actions_mask: False
use_obsnorm: True
use_rewnorm: True
obsnorm_range: 5
rewnorm_range: 5
test_steps: 10000
eval_interval: 50000
test_episode: 3
log_dir: "./logs/ppo/"
model_dir: "./models/ppo/"
agent: "PPO_Clip" # Choice: PPO_Clip, PPO_KL
env_name: "Classic Control"
env_id: "Pendulum-v1"
env_seed: 1
vectorize: "DummyVecEnv"
representation: "Basic_MLP"
policy: "Gaussian_AC"
learner: "PPOCLIP_Learner"
runner: "DRL"
representation_hidden_size: [128,]
actor_hidden_size: [128,]
critic_hidden_size: [128,]
activation: 'leaky_relu'
activation_action: 'tanh'
seed: 1
parallels: 10
running_steps: 300000
horizon_size: 256 # the horizon size for an environment, buffer_size = horizon_size * parallels.
n_epochs: 8
n_minibatch: 8
learning_rate: 0.0004
vf_coef: 0.25
ent_coef: 0.01
target_kl: 0.25 # for PPO_KL agent
kl_coef: 1.0 # for PPO_KL agent
clip_range: 0.2 # for PPO_Clip agent
gamma: 0.98
use_gae: True
gae_lambda: 0.95
use_advnorm: True
use_grad_clip: True # gradient normalization
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
grad_clip_norm: 0.5
use_actions_mask: False
use_obsnorm: True
use_rewnorm: True
obsnorm_range: 5
rewnorm_range: 5
test_steps: 10000
eval_interval: 50000
test_episode: 3
log_dir: "./logs/ppo/"
model_dir: "./models/ppo/"
agent: "PPO_Clip" # Choice: PPO_Clip, PPO_KL
env_name: "Classic Control"
env_id: "MountainCar-v0"
env_seed: 1
vectorize: "DummyVecEnv"
representation: "Basic_MLP"
policy: "Categorical_AC"
learner: "PPOCLIP_Learner"
runner: "DRL"
representation_hidden_size: [128,]
actor_hidden_size: [128,]
critic_hidden_size: [128,]
activation: 'leaky_relu'
seed: 1
parallels: 10
running_steps: 300000
horizon_size: 256 # the horizon size for an environment, buffer_size = horizon_size * parallels.
n_epochs: 8
n_minibatch: 8
learning_rate: 0.0004
vf_coef: 0.25
ent_coef: 0.01
target_kl: 0.25 # for PPO_KL agent
kl_coef: 1.0 # for PPO_KL agent
clip_range: 0.2 # for PPO_Clip agent
gamma: 0.98
use_gae: True
gae_lambda: 0.95
use_advnorm: True
use_grad_clip: True # gradient normalization
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
grad_clip_norm: 0.5
use_actions_mask: False
use_obsnorm: True
use_rewnorm: True
obsnorm_range: 5
rewnorm_range: 5
test_steps: 10000
eval_interval: 50000
test_episode: 3
log_dir: "./logs/ppo/"
model_dir: "./models/ppo/"
agent: "PPO_Clip" # Choice: PPO_Clip, PPO_KL
env_name: "Box2D"
env_id: "BipedalWalker-v3"
env_seed: 1
vectorize: "DummyVecEnv"
representation: "Basic_MLP"
policy: "Gaussian_AC"
learner: "PPOCLIP_Learner"
runner: "DRL"
representation_hidden_size: [128,]
actor_hidden_size: [128,]
critic_hidden_size: [128,]
activation: 'leaky_relu'
activation_action: 'tanh'
seed: 1
parallels: 10
running_steps: 300000
horizon_size: 256 # the horizon size for an environment, buffer_size = horizon_size * parallels.
n_epochs: 8
n_minibatch: 8
learning_rate: 0.0004
vf_coef: 0.25
ent_coef: 0.01
target_kl: 0.25 # for PPO_KL agent
kl_coef: 1.0 # for PPO_KL agent
clip_range: 0.2 # for PPO_Clip agent
gamma: 0.98
use_gae: True
gae_lambda: 0.95
use_advnorm: True
use_grad_clip: True # gradient normalization
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
grad_clip_norm: 0.5
use_actions_mask: False
use_obsnorm: True
use_rewnorm: True
obsnorm_range: 5
rewnorm_range: 5
test_steps: 10000
eval_interval: 50000
test_episode: 3
log_dir: "./logs/ppo/"
model_dir: "./models/ppo/"
agent: "PPO_Clip" # Choice: PPO_Clip, PPO_KL
env_name: "Box2D"
env_id: "CarRacing-v2"
env_seed: 1
vectorize: "DummyVecEnv"
representation: "Basic_CNN"
policy: "Categorical_AC"
learner: "PPOCLIP_Learner"
runner: "DRL"
# the following three arguments are for "Basic_CNN" representation.
filters: [16, 16, 32] # [16, 16, 32, 32]
kernels: [8, 4, 3] # [8, 6, 4, 4]
strides: [4, 2, 1] # [2, 2, 2, 2]
fc_hidden_sizes: [512, ] # fully connected layer hidden sizes.
actor_hidden_size: []
critic_hidden_size: []
activation: "relu" # The activation function of each hidden layer.
seed: 1
parallels: 2
running_steps: 300000
horizon_size: 256 # the horizon size for an environment, buffer_size = horizon_size * parallels.
n_epochs: 8
n_minibatch: 8
learning_rate: 0.0004
vf_coef: 0.25
ent_coef: 0.01
target_kl: 0.25 # for PPO_KL agent
kl_coef: 1.0 # for PPO_KL agent
clip_range: 0.2
gamma: 0.99
use_gae: True
gae_lambda: 0.95 # gae_lambda: Lambda parameter for calculating N-step advantage
use_advnorm: True
use_grad_clip: True # gradient normalization
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
grad_clip_norm: 0.5
use_actions_mask: False
use_obsnorm: False
use_rewnorm: False
obsnorm_range: 5
rewnorm_range: 5
test_steps: 10000
eval_interval: 50000
test_episode: 3
log_dir: "./logs/ppo/"
model_dir: "./models/ppo/"
agent: "PPO_Clip" # Choice: PPO_Clip, PPO_KL
env_name: "Box2D"
env_id: "LunarLander-v2"
env_seed: 1
vectorize: "DummyVecEnv"
representation: "Basic_MLP"
policy: "Categorical_AC"
learner: "PPOCLIP_Learner"
runner: "DRL"
representation_hidden_size: [128,]
actor_hidden_size: [128,]
critic_hidden_size: [128,]
activation: 'leaky_relu'
seed: 1
parallels: 10
running_steps: 300000
horizon_size: 256 # the horizon size for an environment, buffer_size = horizon_size * parallels.
n_epochs: 8
n_minibatch: 8
learning_rate: 0.0004
vf_coef: 0.25
ent_coef: 0.01
target_kl: 0.25 # for PPO_KL agent
kl_coef: 1.0 # for PPO_KL agent
clip_range: 0.2 # for PPO_Clip agent
gamma: 0.98
use_gae: True
gae_lambda: 0.95
use_advnorm: True
use_grad_clip: True # gradient normalization
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
grad_clip_norm: 0.5
use_actions_mask: False
use_obsnorm: True
use_rewnorm: True
obsnorm_range: 5
rewnorm_range: 5
test_steps: 10000
eval_interval: 50000
test_episode: 3
log_dir: "./logs/ppo/"
model_dir: "./models/ppo/"
agent: "PPO_Clip" # Choice: PPO_Clip, PPO_KL
vectorize: "Dummy_Atari"
env_name: "Atari"
env_id: "ALE/Breakout-v5"
env_seed: 1
obs_type: "grayscale" # choice for Atari env: ram, rgb, grayscale
img_size: [84, 84] # default is 210 x 160 in gym[Atari]
num_stack: 4 # frame stack trick
frame_skip: 4 # frame skip trick
noop_max: 30 # Do no-op action for a number of steps in [1, noop_max].
representation: "AC_CNN_Atari" # CNN and FC layers
policy: "Categorical_AC"
learner: "PPOCLIP_Learner"
runner: "DRL"
# Good HyperParameters for Atari Games, Do not change them.
filters: [32, 64, 64]
kernels: [8, 4, 3]
strides: [4, 2, 1]
fc_hidden_sizes: [512, ] # fully connected layer hidden sizes.
actor_hidden_size: []
critic_hidden_size: []
activation: "relu" # The activation function of each hidden layer.
seed: 1
parallels: 8
running_steps: 10000000 # 10M
horizon_size: 128 # the horizon size for an environment, buffer_size = horizon_size * parallels.
n_epochs: 4
n_minibatch: 4
learning_rate: 0.00025
vf_coef: 0.25
ent_coef: 0.01
target_kl: 0.25 # for PPO_KL agent
kl_coef: 1.0 # for PPO_KL agent
clip_range: 0.2 # for PPO_Clip agent
gamma: 0.99
use_gae: True
gae_lambda: 0.95 # gae_lambda: Lambda parameter for calculating N-step advantage
use_advnorm: True
use_grad_clip: True # gradient normalization
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
grad_clip_norm: 0.5
use_actions_mask: False
use_obsnorm: False
use_rewnorm: False
obsnorm_range: 5
rewnorm_range: 5
test_steps: 10000
eval_interval: 100000
test_episode: 3
log_dir: "./logs/ppo/"
model_dir: "./models/ppo/"
agent: "PPO_Clip" # choice: PPO_Clip, PPO_KL
env_name: "MuJoCo"
env_id: "Ant-v4"
env_seed: 1
vectorize: "DummyVecEnv"
learner: "PPOCLIP_Learner"
policy: "Gaussian_AC" # choice: Gaussian_AC for continuous actions, Categorical_AC for discrete actions.
representation: "Basic_MLP"
runner: "DRL"
representation_hidden_size: [256,]
actor_hidden_size: [256,]
critic_hidden_size: [256,]
activation: "leaky_relu"
activation_action: 'tanh'
seed: 79811
parallels: 16
running_steps: 1000000
horizon_size: 256 # the horizon size for an environment, buffer_size = horizon_size * parallels.
n_epochs: 16
n_minibatch: 8
learning_rate: 0.0004
vf_coef: 0.25
ent_coef: 0.0
target_kl: 0.25 # for PPO_KL agent
kl_coef: 1.0 # for PPO_KL agent
clip_range: 0.2 # for PPO_Clip agent
gamma: 0.99
use_gae: True
gae_lambda: 0.95
use_advnorm: True
use_grad_clip: True # gradient normalization
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
grad_clip_norm: 0.5
use_actions_mask: False
use_obsnorm: True
use_rewnorm: True
obsnorm_range: 5
rewnorm_range: 5
test_steps: 10000
eval_interval: 5000
test_episode: 5
log_dir: "./logs/ppo/"
model_dir: "./models/ppo/"
agent: "PPO_Clip" # choice: PPO_Clip, PPO_KL
env_name: "MetaDrive"
env_id: "metadrive"
env_seed: 1
env_config: # the configs for MetaDrive environment
map: "C" # see https://metadrive-simulator.readthedocs.io/en/latest/rl_environments.html#generalization-environment for choices
render: False
vectorize: "SubprocVecEnv"
learner: "PPOCLIP_Learner"
policy: "Gaussian_AC" # choice: Gaussian_AC for continuous actions, Categorical_AC for discrete actions.
representation: "Basic_MLP"
runner: "DRL"
representation_hidden_size: [512,]
actor_hidden_size: [512, 512]
critic_hidden_size: [512, 512]
activation: "leaky_relu"
activation_action: 'tanh'
seed: 1
parallels: 10
running_steps: 500000
horizon_size: 128 # the horizon size for an environment, buffer_size = horizon_size * parallels.
n_epochs: 4
n_minibatch: 4
learning_rate: 0.00025
vf_coef: 0.25
ent_coef: 0.0
target_kl: 0.25 # for PPO_KL agent
kl_coef: 1.0 # for PPO_KL agent
clip_range: 0.2 # for PPO_Clip agent
gamma: 0.99
use_gae: True
gae_lambda: 0.95
use_advnorm: True
use_grad_clip: True # gradient normalization
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
grad_clip_norm: 0.5
use_actions_mask: False
use_obsnorm: True
use_rewnorm: True
obsnorm_range: 5
rewnorm_range: 5
test_steps: 10000
eval_interval: 5000
test_episode: 5
log_dir: "./logs/ppo/"
model_dir: "./models/ppo/"
agent: "PPO_Clip" # choice: PPO_Clip, PPO_KL
env_name: "MiniGrid"
env_id: "MiniGrid-Empty-5x5-v0"
env_seed: 1
RGBImgPartialObsWrapper: False
ImgObsWrapper: False
vectorize: "DummyVecEnv"
learner: "PPOCLIP_Learner"
policy: "Categorical_AC" # choice: Gaussian_AC for continuous actions, Categorical_AC for discrete actions.
representation: "Basic_MLP"
runner: "DRL"
representation_hidden_size: [256,]
actor_hidden_size: [256,]
critic_hidden_size: [256,]
activation: "leaky_relu"
seed: 79811
parallels: 16
running_steps: 100000
horizon_size: 256 # the horizon size for an environment, buffer_size = horizon_size * parallels.
n_epochs: 16
n_minibatch: 8
learning_rate: 0.0001
vf_coef: 0.25
ent_coef: 0.0
target_kl: 0.25 # for PPO_KL agent
kl_coef: 1.0 # for PPO_KL agent
clip_range: 0.2 # for PPO_Clip agent
gamma: 0.99
use_gae: True
gae_lambda: 0.95
use_advnorm: True
use_grad_clip: True # gradient normalization
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
grad_clip_norm: 0.5
use_actions_mask: False
use_obsnorm: True
use_rewnorm: True
obsnorm_range: 5
rewnorm_range: 5
test_steps: 10000
eval_interval: 1000
test_episode: 5
log_dir: "./logs/ppo/"
model_dir: "./models/ppo/"
agent: "PPO_Clip" # choice: PPO_Clip, PPO_KL
env_name: "Drones"
env_id: "HoverAviary" # choices: ['CtrlAviary', 'HoverAviary', 'VelocityAviary']
env_seed: 1
obs_type: 'kin'
act_type: 'one_d_rpm'
num_drones: 1
record: False
render: False
sleep: 0.01
obstacles: True
max_episode_steps: 2000
vectorize: "DummyVecEnv"
learner: "PPOCLIP_Learner"
policy: "Gaussian_AC" # choice: Gaussian_AC for continuous actions, Categorical_AC for discrete actions.
representation: "Basic_MLP"
runner: "DRL"
representation_hidden_size: [512,]
actor_hidden_size: [512,]
critic_hidden_size: [512,]
activation: "leaky_relu"
activation_action: 'tanh'
seed: 79811
parallels: 10
running_steps: 1000000
horizon_size: 256 # the horizon size for an environment, buffer_size = horizon_size * parallels.
n_epochs: 16
n_minibatch: 8
learning_rate: 0.0004
vf_coef: 0.25
ent_coef: 0.0
target_kl: 0.25 # for PPO_KL agent
kl_coef: 1.0 # for PPO_KL agent
clip_range: 0.2 # for PPO_Clip agent
gamma: 0.99
use_gae: True
gae_lambda: 0.95
use_advnorm: True
use_grad_clip: True # gradient normalization
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
grad_clip_norm: 0.5
use_actions_mask: False
use_obsnorm: True
use_rewnorm: True
obsnorm_range: 5
rewnorm_range: 5
test_steps: 10000
eval_interval: 5000
test_episode: 5
log_dir: "./logs/ppo/"
model_dir: "./models/ppo/"
agent: "A2C"
env_name: "Classic Control"
env_id: "CartPole-v1"
env_seed: 1
vectorize: "SubprocVecEnv"
learner: "A2C_Learner"
policy: "Categorical_AC"
representation: "Basic_MLP"
runner: "DRL"
representation_hidden_size: [128,]
actor_hidden_size: [128,]
critic_hidden_size: [128,]
activation: 'leaky_relu'
seed: 1
parallels: 10
running_steps: 300000
horizon_size: 256 # the horizon size for an environment, buffer_size = horizon_size * parallels.
n_epochs: 1
n_minibatch: 8
learning_rate: 0.0004
vf_coef: 0.25
ent_coef: 0.01
gamma: 0.98
use_gae: True
gae_lambda: 0.95
use_advnorm: True
use_grad_clip: True # gradient normalization
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
grad_clip_norm: 0.5
use_actions_mask: False
use_obsnorm: True
use_rewnorm: True
obsnorm_range: 5
rewnorm_range: 5
test_steps: 10000
eval_interval: 50000
test_episode: 3
log_dir: "./logs/a2c/"
model_dir: "./models/a2c/"
agent: "A2C"
env_name: "Classic Control"
env_id: "Acrobot-v1"
env_seed: 1
vectorize: "DummyVecEnv"
learner: "A2C_Learner"
policy: "Categorical_AC"
representation: "Basic_MLP"
runner: "DRL"
representation_hidden_size: [128,]
actor_hidden_size: [128,]
critic_hidden_size: [128,]
activation: 'leaky_relu'
seed: 1
parallels: 10
running_steps: 300000
horizon_size: 256 # the horizon size for an environment, buffer_size = horizon_size * parallels.
n_epochs: 8
n_minibatch: 8
learning_rate: 0.0004
vf_coef: 0.25
ent_coef: 0.01
gamma: 0.98
use_gae: True
gae_lambda: 0.95
use_advnorm: True
use_grad_clip: True # gradient normalization
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
grad_clip_norm: 0.5
use_actions_mask: False
use_obsnorm: True
use_rewnorm: True
obsnorm_range: 5
rewnorm_range: 5
test_steps: 10000
eval_interval: 50000
test_episode: 3
log_dir: "./logs/a2c/"
model_dir: "./models/a2c/"
agent: "A2C"
env_name: "Classic Control"
env_id: "Pendulum-v1"
env_seed: 1
vectorize: "DummyVecEnv"
learner: "A2C_Learner"
policy: "Gaussian_AC"
representation: "Basic_MLP"
runner: "DRL"
representation_hidden_size: [128,]
actor_hidden_size: [128,]
critic_hidden_size: [128,]
activation: 'leaky_relu'
activation_action: 'tanh'
seed: 1
parallels: 10
running_steps: 1000000
horizon_size: 64 # the horizon size for an environment, buffer_size = horizon_size * parallels.
n_epochs: 1
n_minibatch: 1
learning_rate: 0.0004
vf_coef: 0.25
ent_coef: 0.01
gamma: 0.98
use_gae: True
gae_lambda: 0.95
use_advnorm: True
use_grad_clip: True # gradient normalization
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
grad_clip_norm: 0.5
use_actions_mask: False
use_obsnorm: True
use_rewnorm: True
obsnorm_range: 5
rewnorm_range: 5
test_steps: 10000
eval_interval: 50000
test_episode: 3
log_dir: "./logs/a2c/"
model_dir: "./models/a2c/"
agent: "A2C"
env_name: "Classic Control"
env_id: "MountainCar-v0"
env_seed: 1
vectorize: "DummyVecEnv"
learner: "A2C_Learner"
policy: "Categorical_AC"
representation: "Basic_MLP"
runner: "DRL"
representation_hidden_size: [256,]
actor_hidden_size: [256,]
critic_hidden_size: [256,]
activation: 'leaky_relu'
seed: 1
parallels: 10
running_steps: 300000
horizon_size: 128 # the horizon size for an environment, buffer_size = horizon_size * parallels.
n_epochs: 1
n_minibatch: 1
learning_rate: 0.0004
vf_coef: 0.25
ent_coef: 0.01
gamma: 0.98
use_gae: True
gae_lambda: 0.95
use_advnorm: True
use_grad_clip: True # gradient normalization
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
grad_clip_norm: 0.5
use_actions_mask: False
use_obsnorm: True
use_rewnorm: True
obsnorm_range: 5
rewnorm_range: 5
test_steps: 10000
eval_interval: 50000
test_episode: 3
log_dir: "./logs/a2c/"
model_dir: "./models/a2c/"
agent: "A2C"
env_name: "Box2D"
env_id: "BipedalWalker-v3"
env_seed: 1
vectorize: "DummyVecEnv"
learner: "A2C_Learner"
policy: "Gaussian_AC"
representation: "Basic_MLP"
runner: "DRL"
representation_hidden_size: [64,]
actor_hidden_size: [64,]
critic_hidden_size: [64,]
activation: 'leaky_relu'
activation_action: 'tanh'
seed: 1
parallels: 10
running_steps: 1000000
horizon_size: 128 # the horizon size for an environment, buffer_size = horizon_size * parallels.
n_epochs: 5
n_minibatch: 8
learning_rate: 0.0004
vf_coef: 0.25
ent_coef: 0.01
gamma: 0.98
use_gae: True
gae_lambda: 0.95
use_advnorm: True
use_grad_clip: True # gradient normalization
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
grad_clip_norm: 0.5
use_actions_mask: False
use_obsnorm: True
use_rewnorm: True
obsnorm_range: 5
rewnorm_range: 5
test_steps: 10000
eval_interval: 50000
test_episode: 5
log_dir: "./logs/a2c/"
model_dir: "./models/a2c/"
agent: "A2C"
env_name: "Box2D"
env_id: "LunarLander-v2"
env_seed: 1
vectorize: "DummyVecEnv"
learner: "A2C_Learner"
policy: "Categorical_AC"
representation: "Basic_MLP"
runner: "DRL"
representation_hidden_size: [256,]
actor_hidden_size: [256,]
critic_hidden_size: [256,]
activation: 'leaky_relu'
seed: 1
parallels: 10
running_steps: 300000
horizon_size: 128 # the horizon size for an environment, buffer_size = horizon_size * parallels.
n_epochs: 1
n_minibatch: 8
learning_rate: 0.0004
vf_coef: 0.25
ent_coef: 0.01
gamma: 0.98
use_gae: True
gae_lambda: 0.95
use_advnorm: True
use_grad_clip: True # gradient normalization
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
grad_clip_norm: 0.5
use_actions_mask: False
use_obsnorm: True
use_rewnorm: True
obsnorm_range: 5
rewnorm_range: 5
test_steps: 10000
eval_interval: 50000
test_episode: 3
log_dir: "./logs/a2c/"
model_dir: "./models/a2c/"
agent: "A2C"
env_name: "MuJoCo"
env_id: "Ant-v4"
env_seed: 1
vectorize: "DummyVecEnv"
learner: "A2C_Learner"
policy: "Gaussian_AC"
representation: "Basic_MLP"
runner: "DRL"
representation_hidden_size: [256,]
actor_hidden_size: [256,]
critic_hidden_size: [256,]
activation: "leaky_relu"
activation_action: 'tanh'
seed: 6782
parallels: 16
running_steps: 1000000 # 1M
horizon_size: 16 # the horizon size for an environment, buffer_size = horizon_size * parallels.
n_epochs: 1
n_minibatch: 1
learning_rate: 0.0007
vf_coef: 0.25
ent_coef: 0.0
gamma: 0.99
use_gae: True
gae_lambda: 0.95
use_advnorm: True
use_grad_clip: True # gradient normalization
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
grad_clip_norm: 0.5
use_actions_mask: False
use_obsnorm: True
use_rewnorm: True
obsnorm_range: 5
rewnorm_range: 5
test_steps: 10000
eval_interval: 5000
test_episode: 5
log_dir: "./logs/a2c/"
model_dir: "./models/a2c/"
agent: "A2C"
vectorize: "Dummy_Atari"
env_name: "Atari"
env_id: "ALE/Breakout-v5"
env_seed: 1
obs_type: "grayscale" # choice for Atari env: ram, rgb, grayscale
img_size: [84, 84] # default is 210 x 160 in gym[Atari]
num_stack: 4 # frame stack trick
frame_skip: 4 # frame skip trick
noop_max: 30 # Do no-op action for a number of steps in [1, noop_max].
learner: "A2C_Learner"
policy: "Categorical_AC"
representation: "Basic_CNN"
runner: "DRL"
# the following three arguments are for "Basic_CNN" representation.
filters: [32, 32, 64, 64]
kernels: [8, 4, 4, 4]
strides: [4, 2, 2, 2]
actor_hidden_size: [128, 128]
critic_hidden_size: [128, 128]
activation: "leaky_relu"
seed: 1
parallels: 5
running_steps: 10000000 # 10M
horizon_size: 256 # the horizon size for an environment, buffer_size = horizon_size * parallels. #
n_epochs: 4
n_minibatch: 8
learning_rate: 0.0007
vf_coef: 0.25
ent_coef: 0.01
gamma: 0.99
use_gae: True
gae_lambda: 0.95
use_advnorm: True
use_grad_clip: True # gradient normalization
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
grad_clip_norm: 0.5
use_actions_mask: False
use_obsnorm: False
use_rewnorm: False
obsnorm_range: 5
rewnorm_range: 5
test_steps: 10000
eval_interval: 100000
test_episode: 3
log_dir: "./logs/a2c/"
model_dir: "./models/a2c/"
agent: "SAC"
env_name: "Classic Control"
env_id: "CartPole-v1"
env_seed: 1
vectorize: "DummyVecEnv"
learner: "SACDIS_Learner"
policy: "Categorical_SAC"
representation: "Basic_MLP"
runner: "DRL"
representation_hidden_size: [128,]
actor_hidden_size: [128,]
critic_hidden_size: [128,]
activation: "relu" # The activation function of each hidden layer.
seed: 1
parallels: 10
buffer_size: 200000
batch_size: 256
learning_rate_actor: 0.001
learning_rate_critic: 0.001
gamma: 0.99
alpha: 0.2
use_automatic_entropy_tuning: False
tau: 0.005
training_frequency: 2
running_steps: 500000
start_training: 2000
use_grad_clip: False # gradient normalization
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
grad_clip_norm: 0.5
use_actions_mask: False
use_obsnorm: False
use_rewnorm: False
obsnorm_range: 5
rewnorm_range: 5
test_steps: 10000
eval_interval: 50000
test_episode: 5
log_dir: "./logs/sac/"
model_dir: "./models/sac/"
agent: "SAC"
env_name: "Classic Control"
env_id: "Acrobot-v1"
env_seed: 1
vectorize: "DummyVecEnv"
learner: "SACDIS_Learner"
policy: "Categorical_SAC"
representation: "Basic_MLP"
runner: "DRL"
representation_hidden_size: [128,]
actor_hidden_size: [128,]
critic_hidden_size: [128,]
activation: "relu" # The activation function of each hidden layer.
seed: 1
parallels: 10
buffer_size: 200000
batch_size: 256
learning_rate_actor: 0.001
learning_rate_critic: 0.001
gamma: 0.99
alpha: 0.2
use_automatic_entropy_tuning: True
tau: 0.005
training_frequency: 2
running_steps: 500000
start_training: 2000
use_grad_clip: False # gradient normalization
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
grad_clip_norm: 0.5
use_actions_mask: False
use_obsnorm: False
use_rewnorm: False
obsnorm_range: 5
rewnorm_range: 5
test_steps: 10000
eval_interval: 50000
test_episode: 5
log_dir: "./logs/sac/"
model_dir: "./models/sac/"
agent: "SAC"
env_name: "Classic Control"
env_id: "Pendulum-v1"
env_seed: 1
vectorize: "DummyVecEnv"
learner: "SAC_Learner"
policy: "Gaussian_SAC"
representation: "Basic_MLP"
runner: "DRL"
representation_hidden_size: [256,]
actor_hidden_size: [256,]
critic_hidden_size: [256,]
activation: "relu" # The activation function of each hidden layer.
activation_action: 'tanh'
seed: 1
parallels: 10
buffer_size: 200000
batch_size: 256
learning_rate_actor: 0.001
learning_rate_critic: 0.001
gamma: 0.98
alpha: 0.2
use_automatic_entropy_tuning: True
tau: 0.005
training_frequency: 1
running_steps: 300000
start_training: 1000
use_grad_clip: False # gradient normalization
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
grad_clip_norm: 0.5
use_actions_mask: False
use_obsnorm: False
use_rewnorm: False
obsnorm_range: 5
rewnorm_range: 5
test_steps: 10000
eval_interval: 50000
test_episode: 5
log_dir: "./logs/sac/"
model_dir: "./models/sac/"
agent: "SAC"
env_name: "Classic Control"
env_id: "MountainCar-v0"
env_seed: 1
vectorize: "DummyVecEnv"
learner: "SACDIS_Learner"
policy: "Categorical_SAC"
representation: "Basic_MLP"
runner: "DRL"
representation_hidden_size: [128,]
actor_hidden_size: [128,]
critic_hidden_size: [128,]
activation: "relu" # The activation function of each hidden layer.
seed: 1
parallels: 10
buffer_size: 200000
batch_size: 256
learning_rate_actor: 0.001
learning_rate_critic: 0.01
gamma: 0.98
alpha: 0.2
use_automatic_entropy_tuning: True
tau: 0.005
training_frequency: 2
running_steps: 500000
start_training: 2000
use_grad_clip: False # gradient normalization
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
grad_clip_norm: 0.5
use_actions_mask: False
use_obsnorm: False
use_rewnorm: False
obsnorm_range: 5
rewnorm_range: 5
test_steps: 10000
eval_interval: 50000
test_episode: 5
log_dir: "./logs/sac/"
model_dir: "./models/sac/"
agent: "SAC"
env_name: "Box2D"
env_id: "BipedalWalker-v3"
env_seed: 1
vectorize: "DummyVecEnv"
learner: "SAC_Learner"
policy: "Gaussian_SAC"
representation: "Basic_Identical"
runner: "DRL"
representation_hidden_size:
actor_hidden_size: [256, 256]
critic_hidden_size: [256, 256]
activation: "leaky_relu"
activation_action: 'tanh'
seed: 1
parallels: 10 # number of environments
buffer_size: 200000
batch_size: 256
learning_rate_actor: 0.001
learning_rate_critic: 0.001
gamma: 0.99
alpha: 0.2
use_automatic_entropy_tuning: True
tau: 0.005
training_frequency: 1
running_steps: 5000000
start_training: 2000
use_grad_clip: False # gradient normalization
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
grad_clip_norm: 0.5
use_actions_mask: False
use_obsnorm: False
use_rewnorm: False
obsnorm_range: 5
rewnorm_range: 5
test_steps: 10000
eval_interval: 500000
test_episode: 5
log_dir: "./logs/sac/"
model_dir: "./models/sac/"
agent: "SAC"
env_name: "Box2D"
env_id: "LunarLander-v2"
env_seed: 1
vectorize: "DummyVecEnv"
learner: "SACDIS_Learner"
policy: "Categorical_SAC"
representation: "Basic_MLP"
runner: "DRL"
representation_hidden_size: [256,]
actor_hidden_size: [128,128,]
critic_hidden_size: [128,128,]
activation: "relu" # The activation function of each hidden layer.
seed: 1
parallels: 10
buffer_size: 200000
batch_size: 256
learning_rate_actor: 0.001
learning_rate_critic: 0.01
gamma: 0.99
alpha: 0.2
use_automatic_entropy_tuning: True
tau: 0.005
training_frequency: 1
running_steps: 500000
start_training: 2000
use_grad_clip: False # gradient normalization
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
grad_clip_norm: 0.5
use_actions_mask: False
use_obsnorm: False
use_rewnorm: False
obsnorm_range: 5
rewnorm_range: 5
test_steps: 10000
eval_interval: 50000
test_episode: 1
log_dir: "./logs/sac/"
model_dir: "./models/sac/"
agent: "SAC"
vectorize: "Dummy_Atari"
env_name: "Atari"
env_id: "ALE/Breakout-v5"
env_seed: 1
obs_type: "grayscale" # choice for Atari env: ram, rgb, grayscale
img_size: [84, 84] # default is 210 x 160 in gym[Atari]
num_stack: 4 # frame stack trick
frame_skip: 4 # frame skip trick
noop_max: 30 # Do no-op action for a number of steps in [1, noop_max].
representation: "Basic_CNN"
policy: "Categorical_SAC"
learner: "SACDIS_Learner"
runner: "DRL"
filters: [32, 32, 64, 64]
kernels: [8, 4, 4, 4]
strides: [4, 2, 2, 2]
actor_hidden_size: [128, 128]
critic_hidden_size: [128, 128]
activation: "leaky_relu"
seed: 1069
parallels: 5
buffer_size: 500000
batch_size: 32 # 64
learning_rate_actor: 0.001
learning_rate_critic: 0.001
gamma: 0.99
alpha: 0.01
use_automatic_entropy_tuning: False
tau: 0.005
training_frequency: 1
running_steps: 50000000 # 50M
start_training: 10000
use_grad_clip: False # gradient normalization
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
grad_clip_norm: 0.5
use_actions_mask: False
use_obsnorm: False
use_rewnorm: False
obsnorm_range: 5
rewnorm_range: 5
test_steps: 10000
eval_interval: 500000
test_episode: 1
log_dir: "./logs/sac/"
model_dir: "./models/sac/"
agent: "SAC"
env_name: "MuJoCo"
env_id: "Ant-v4"
env_seed: 1
vectorize: "DummyVecEnv"
learner: "SAC_Learner"
policy: "Gaussian_SAC"
representation: "Basic_Identical"
runner: "DRL"
representation_hidden_size:
actor_hidden_size: [256, 256]
critic_hidden_size: [256, 256]
activation: "leaky_relu"
activation_action: 'tanh'
seed: 1
parallels: 4 # number of environments
buffer_size: 1000000
batch_size: 256
learning_rate_actor: 0.001
learning_rate_critic: 0.001
gamma: 0.99
alpha: 0.2
use_automatic_entropy_tuning: True
tau: 0.005
training_frequency: 1
running_steps: 1000000
start_training: 10000
use_grad_clip: False # gradient normalization
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
grad_clip_norm: 0.5
use_actions_mask: False
use_obsnorm: False
use_rewnorm: False
obsnorm_range: 5
rewnorm_range: 5
test_steps: 10000
eval_interval: 10000
test_episode: 5
log_dir: "./logs/sac/"
model_dir: "./models/sac/"
agent: "SAC"
env_name: "MetaDrive"
env_id: "metadrive"
env_seed: 1
env_config: # the configs for MetaDrive environment
map: "C" # see https://metadrive-simulator.readthedocs.io/en/latest/rl_environments.html#generalization-environment for choices
render: False
vectorize: "SubprocVecEnv"
learner: "SAC_Learner"
policy: "Gaussian_SAC"
representation: "Basic_Identical"
runner: "DRL"
representation_hidden_size:
actor_hidden_size: [512, 512]
critic_hidden_size: [512, 512]
activation: "relu" # The activation function of each hidden layer.
activation_action: 'tanh'
seed: 1
parallels: 4
buffer_size: 1000000
batch_size: 256
learning_rate_actor: 0.0003
learning_rate_critic: 0.0003
gamma: 0.99
alpha: 0.2
use_automatic_entropy_tuning: True
tau: 0.005
training_frequency: 1
running_steps: 1000000
start_training: 10000
use_grad_clip: False # gradient normalization
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
grad_clip_norm: 0.5
use_actions_mask: False
use_obsnorm: False
use_rewnorm: False
obsnorm_range: 5
rewnorm_range: 5
test_steps: 10000
eval_interval: 10000
test_episode: 5
log_dir: "./logs/sac/"
model_dir: "./models/sac/"
agent: "SAC"
env_name: "Drones"
env_id: "HoverAviary"
env_seed: 1
obs_type: 'kin'
act_type: 'one_d_rpm'
num_drones: 1
record: False
obstacles: True
max_episode_steps: 2000 #
render: False
sleep: 0.01
vectorize: "DummyVecEnv"
learner: "SAC_Learner"
policy: "Gaussian_SAC"
representation: "Basic_Identical"
runner: "DRL"
representation_hidden_size:
actor_hidden_size: [512, 512]
critic_hidden_size: [512, 512]
activation: "leaky_relu"
activation_action: 'tanh'
seed: 1
parallels: 10
buffer_size: 1000000
batch_size: 256
learning_rate_actor: 0.0003
learning_rate_critic: 0.0003
gamma: 0.99
alpha: 0.2
use_automatic_entropy_tuning: True
tau: 0.005
training_frequency: 1
running_steps: 1000000
start_training: 10000
use_grad_clip: False # gradient normalization
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
grad_clip_norm: 0.5
use_actions_mask: False
use_obsnorm: False
use_rewnorm: False
obsnorm_range: 5
rewnorm_range: 5
test_steps: 10000
eval_interval: 10000
test_episode: 5
log_dir: "./logs/sac/"
model_dir: "./models/sac/"
agent: "DDPG"
env_name: "Classic Control"
env_id: "Pendulum-v1"
env_seed: 1
vectorize: "DummyVecEnv"
policy: "DDPG_Policy"
representation: "Basic_MLP"
learner: "DDPG_Learner"
runner: "DRL"
representation_hidden_size: [256,]
actor_hidden_size: [256,]
critic_hidden_size: [256,]
activation: "relu" # The activation function of each hidden layer.
activation_action: 'tanh'
seed: 1
parallels: 10
buffer_size: 200000
batch_size: 256
learning_rate_actor: 0.001
learning_rate_critic: 0.001
gamma: 0.98
tau: 0.005
start_noise: 0.1
end_noise: 0.1
training_frequency: 1
running_steps: 500000
start_training: 1000
use_grad_clip: False # gradient normalization
grad_clip_norm: 0.5
use_obsnorm: False
use_rewnorm: False
obsnorm_range: 5
rewnorm_range: 5
test_steps: 10000
eval_interval: 50000
test_episode: 3
log_dir: "./logs/ddpg/"
model_dir: "./models/ddpg/"
agent: "DDPG"
env_name: "Box2D"
env_id: "BipedalWalker-v3"
env_seed: 1
vectorize: "DummyVecEnv"
representation: "Basic_Identical"
policy: "DDPG_Policy"
learner: "DDPG_Learner"
runner: "DRL"
representation_hidden_size:
actor_hidden_size: [256, 256]
critic_hidden_size: [256, 256]
activation: "leaky_relu"
activation_action: 'tanh'
seed: 1
parallels: 10 # number of environments
buffer_size: 200000
batch_size: 256
learning_rate_actor: 0.001
learning_rate_critic: 0.001
gamma: 0.99
tau: 0.005
start_noise: 0.5
end_noise: 0.1
training_frequency: 1
running_steps: 2000000
start_training: 1000
use_grad_clip: False # gradient normalization
grad_clip_norm: 0.5
use_obsnorm: False
use_rewnorm: False
obsnorm_range: 5
rewnorm_range: 5
test_steps: 10000
eval_interval: 50000
test_episode: 5
log_dir: "./logs/ddpg/"
model_dir: "./models/ddpg/"
agent: "DDPG"
env_name: "MuJoCo"
env_id: "Ant-v4"
env_seed: 1
vectorize: "DummyVecEnv"
policy: "DDPG_Policy"
representation: "Basic_Identical"
learner: "DDPG_Learner"
runner: "DRL"
representation_hidden_size: # If you choose Basic_Identical representation, then ignore this value
actor_hidden_size: [400, 300]
critic_hidden_size: [400, 300]
activation: "leaky_relu"
activation_action: 'tanh'
seed: 19089
parallels: 4 # number of environments
buffer_size: 200000 # replay buffer size
batch_size: 100
learning_rate_actor: 0.001
learning_rate_critic: 0.001
gamma: 0.99
tau: 0.005
start_noise: 0.5
end_noise: 0.1
training_frequency: 1
running_steps: 1000000 # 1M
start_training: 10000
use_grad_clip: False # gradient normalization
grad_clip_norm: 0.5
use_obsnorm: False
use_rewnorm: False
obsnorm_range: 5
rewnorm_range: 5
test_steps: 10000
eval_interval: 5000
test_episode: 5
log_dir: "./logs/ddpg/"
model_dir: "./models/ddpg/"
agent: "DDPG"
env_name: "Drones"
env_id: "HoverAviary"
env_seed: 1
obs_type: 'kin'
act_type: 'one_d_rpm'
num_drones: 1
record: False
obstacles: True
max_episode_steps: 2000 #
render: False
sleep: 0.01
vectorize: "DummyVecEnv"
policy: "DDPG_Policy"
representation: "Basic_Identical"
learner: "DDPG_Learner"
runner: "DRL"
actor_hidden_size: [64, 64]
critic_hidden_size: [64, 64]
activation: "leaky_relu"
activation_action: 'tanh'
seed: 1
parallels: 10
buffer_size: 1000000 # buffer
batch_size: 1024
learning_rate_actor: 0.001
learning_rate_critic: 0.001
gamma: 0.99
tau: 0.005
start_noise: 0.1
end_noise: 0.1
training_frequency: 1
running_steps: 10000000 # total step
start_training: 2000
use_grad_clip: False # gradient normalization
grad_clip_norm: 0.5
use_obsnorm: False
use_rewnorm: False
obsnorm_range: 5
rewnorm_range: 5
test_steps: 10000
eval_interval: 100000
test_episode: 3
log_dir: "./logs/ddpg/"
model_dir: "./models/ddpg/"
agent: "TD3"
env_name: "Classic Control"
env_id: "Pendulum-v1"
env_seed: 1
vectorize: "DummyVecEnv"
representation: "Basic_Identical"
policy: "TD3_Policy"
learner: "TD3_Learner"
runner: "DRL"
representation_hidden_size: [64]
actor_hidden_size: [256, ]
critic_hidden_size: [256, ]
activation: "leaky_relu"
activation_action: 'tanh'
seed: 1
parallels: 10
buffer_size: 200000
batch_size: 256
learning_rate_actor: 0.0005
learning_rate_critic: 0.001
gamma: 0.98
tau: 0.005
actor_update_delay: 3
start_noise: 0.25
end_noise: 0.05
training_frequency: 2
running_steps: 500000
start_training: 2000
use_grad_clip: False # gradient normalization
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
grad_clip_norm: 0.5
use_obsnorm: False
use_rewnorm: False
obsnorm_range: 5
rewnorm_range: 5
test_steps: 10000
eval_interval: 50000
test_episode: 1
log_dir: "./logs/td3/"
model_dir: "./models/td3/"
agent: "TD3"
env_name: "Box2D"
env_id: "BipedalWalker-v3"
env_seed: 1
vectorize: "DummyVecEnv"
representation: "Basic_Identical"
policy: "TD3_Policy"
learner: "TD3_Learner"
runner: "DRL"
representation_hidden_size:
actor_hidden_size: [256, 256]
critic_hidden_size: [256, 256]
activation: "leaky_relu"
activation_action: 'tanh'
seed: 1
parallels: 10
buffer_size: 200000
batch_size: 256
learning_rate_actor: 0.0005
learning_rate_critic: 0.001
gamma: 0.99
tau: 0.005
actor_update_delay: 3
start_noise: 0.25
end_noise: 0.05
training_frequency: 2
running_steps: 2000000
start_training: 2000
use_grad_clip: False # gradient normalization
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
grad_clip_norm: 0.5
use_obsnorm: False
use_rewnorm: False
obsnorm_range: 5
rewnorm_range: 5
test_steps: 10000
eval_interval: 50000
test_episode: 5
log_dir: "./logs/td3/"
model_dir: "./models/td3/"
agent: "TD3"
env_name: "MuJoCo"
env_id: "Ant-v4"
env_seed: 1
vectorize: "DummyVecEnv"
representation: "Basic_Identical"
policy: "TD3_Policy"
learner: "TD3_Learner"
runner: "DRL"
representation_hidden_size: # If you choose Basic_Identical representation, then ignore this value
actor_hidden_size: [400, 300]
critic_hidden_size: [400, 300]
activation: "leaky_relu"
activation_action: 'tanh'
seed: 6782
parallels: 4 # number of environments
buffer_size: 200000
batch_size: 256
learning_rate_actor: 0.001
actor_update_delay: 2
learning_rate_critic: 0.001
gamma: 0.99
tau: 0.005
start_noise: 0.1
end_noise: 0.1
training_frequency: 1
running_steps: 1000000
start_training: 25000
use_grad_clip: False # gradient normalization
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
grad_clip_norm: 0.5
use_obsnorm: False
use_rewnorm: False
obsnorm_range: 5
rewnorm_range: 5
test_steps: 10000
eval_interval: 5000
test_episode: 5
log_dir: "./logs/td3/"
model_dir: "./models/td3/"
agent: "PDQN"
env_name: "Platform"
env_id: "Platform-v0"
env_seed: 1
max_episode_steps: 200
vectorize: "NOREQUIRED"
render: False
learner: "PDQN_Learner"
policy: "PDQN_Policy"
representation: "Basic_MLP"
runner: "DRL"
representation_hidden_size: [128,]
conactor_hidden_size: [128,]
qnetwork_hidden_size: [128, ]
activation: "relu" # The activation function of each hidden layer.
activation_action: 'tanh'
buffer_size: 20000
batch_size: 128
learning_rate: 0.001
gamma: 0.99
tau: 0.005
start_greedy: 0.5
end_greedy: 0.05
decay_step_greedy: 1000000 # 1M
start_noise: 0.1
end_noise: 0.1
training_frequency: 1
running_steps: 30000
start_training: 1000
test_steps: 10000
eval_interval: 1000
test_episode: 5
use_grad_clip: False # gradient normalization
grad_clip_norm: 0.5
use_actions_mask: False
use_obsnorm: False
use_rewnorm: False
obsnorm_range: 5
rewnorm_range: 5
log_dir: "./logs/pdqn/"
model_dir: "./models/pdqn/"
agent: "MPDQN"
env_name: "Platform"
env_id: "Platform-v0"
env_seed: 1
max_episode_steps: 200
vectorize: "NOREQUIRED"
render: False
learner: "MPDQN_Learner"
policy: "MPDQN_Policy"
representation: "Basic_MLP"
runner: "DRL"
representation_hidden_size: [128,]
conactor_hidden_size: [128,]
qnetwork_hidden_size: [128, ]
activation: "relu" # The activation function of each hidden layer.
activation_action: 'tanh'
buffer_size: 20000
batch_size: 128
learning_rate: 0.001
gamma: 0.99
tau: 0.005
start_greedy: 0.5
end_greedy: 0.05
decay_step_greedy: 1000000 # 1M
start_noise: 0.1
end_noise: 0.1
training_frequency: 1
running_steps: 30000
start_training: 1000
test_steps: 10000
eval_interval: 1000
test_episode: 5
use_grad_clip: False # gradient normalization
grad_clip_norm: 0.5
use_actions_mask: False
use_obsnorm: False
use_rewnorm: False
obsnorm_range: 5
rewnorm_range: 5
log_dir: "./logs/mpdqn/"
model_dir: "./models/mpdqn/"
agent: "SPDQN"
env_name: "Platform"
env_id: "Platform-v0"
env_seed: 1
max_episode_steps: 200
vectorize: "NOREQUIRED"
learner: "SPDQN_Learner"
policy: "SPDQN_Policy"
render: False
representation: "Basic_MLP"
runner: "DRL"
representation_hidden_size: [128,]
conactor_hidden_size: [128,]
qnetwork_hidden_size: [128, ]
activation: "relu" # The activation function of each hidden layer.
activation_action: 'tanh'
buffer_size: 20000
batch_size: 128
learning_rate: 0.001
gamma: 0.99
tau: 0.005
start_noise: 0.1
end_noise: 0.1
training_frequency: 1
running_steps: 30000
start_training: 1000
test_steps: 10000
eval_interval: 1000
test_episode: 5
use_grad_clip: False # gradient normalization
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
grad_clip_norm: 0.5
use_actions_mask: False
use_obsnorm: False
use_rewnorm: False
obsnorm_range: 5
rewnorm_range: 5
log_dir: "./logs/spdqn/"
model_dir: "./models/spdqn/"
MARL algorithms¶
agent: "IQL" # the learning algorithms_marl
env_name: "mpe" # Name of the environment.
env_id: "simple_spread_v3"
env_seed: 1
continuous_action: False # Continuous action space or not.
learner: "IQL_Learner"
policy: "Basic_Q_network_marl"
representation: "Basic_MLP"
vectorize: "DummyVecMultiAgentEnv"
runner: "MARL" # Runner
use_rnn: False # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
representation_hidden_size: [64, ]
q_hidden_size: [64, ] # the units for each hidden layer
activation: "relu" # The activation function of each hidden layer.
seed: 1
parallels: 16
buffer_size: 100000
batch_size: 256
learning_rate: 0.001
gamma: 0.99 # discount factor
double_q: True # use double q learning
start_greedy: 1.0
end_greedy: 0.05
decay_step_greedy: 2500000
start_training: 1000 # start training after n steps
running_steps: 10000000 # 10M
training_frequency: 25
sync_frequency: 100
use_grad_clip: False
grad_clip_norm: 0.5
use_parameter_sharing: True
use_actions_mask: False
eval_interval: 100000
test_episode: 5
log_dir: "./logs/iql/"
model_dir: "./models/iql/"
agent: "IQL" # the learning algorithms_marl
env_name: "RoboticWarehouse"
env_id: "rware-tiny-2ag-v1"
env_seed: 1
max_episode_steps: 100
learner: "IQL_Learner"
policy: "Basic_Q_network_marl"
representation: "Basic_MLP"
vectorize: "DummyVecMultiAgentEnv"
runner: "MARL" # Runner
use_rnn: False # Whether to use recurrent neural networks.
rnn:
representation_hidden_size: [64, ]
q_hidden_size: [64, ] # the units for each hidden layer
activation: "relu" # The activation function of each hidden layer.
seed: 1
parallels: 16
buffer_size: 100000
batch_size: 256
learning_rate: 0.001
gamma: 0.99 # discount factor
double_q: True # use double q learning
start_greedy: 1.0
end_greedy: 0.05
decay_step_greedy: 1000000
start_training: 1000 # start training after n steps
running_steps: 5000000 # 5M
training_frequency: 1
sync_frequency: 100
use_grad_clip: False
grad_clip_norm: 0.5
use_parameter_sharing: True
use_actions_mask: True
eval_interval: 100000
test_episode: 5
log_dir: "./logs/iql/"
model_dir: "./models/iql/"
agent: "IQL" # the learning algorithms_marl
env_name: "MAgent2"
env_id: "adversarial_pursuit_v4"
env_seed: 1
minimap_mode: False
max_cycles: 500
extra_features: False
map_size: 45
render_mode: "rgb_array"
learner: "IQL_Learner"
policy: "Basic_Q_network_marl"
representation: "Basic_MLP"
vectorize: "Dummy_MAgent"
runner: "RunnerMAgent"
# recurrent settings for Basic_RNN representation
use_rnn: False # Whether to use recurrent neural networks.
rnn:
representation_hidden_size: [64, ]
q_hidden_size: [64, ] # the units for each hidden layer
activation: "relu" # The activation function of each hidden layer.
seed: 1
parallels: 10
buffer_size: 20000
batch_size: 256
learning_rate: 0.001
gamma: 0.95 # discount factor
double_q: True # use double q learning
start_greedy: 1.0
end_greedy: 0.05
decay_step_greedy: 2500000
start_training: 1000 # start training after n steps
running_steps: 10000000 # 10M
training_frequency: 1
sync_frequency: 100
use_grad_clip: False
grad_clip_norm: 0.5
use_parameter_sharing: True
use_actions_mask: True
eval_interval: 100000
test_episode: 5
log_dir: "./logs/iql/"
model_dir: "./models/iql/"
agent: "IQL" # the learning algorithms_marl
global_state: False
# environment settings
env_name: "Football"
scenario: "academy_3_vs_1_with_keeper"
env_seed: 1
use_stacked_frames: False # Whether to use stacked_frames
num_agent: 3
num_adversary: 0
obs_type: "simple115v2" # representation used to build the observation, choices: ["simple115v2", "extracted", "pixels_gray", "pixels"]
rewards_type: "scoring,checkpoints" # comma separated list of rewards to be added
smm_width: 96 # width of super minimap
smm_height: 72 # height of super minimap
fps: 15
max_episode_steps: 1000
learner: "IQL_Learner"
policy: "Basic_Q_network_marl"
representation: "Basic_RNN"
vectorize: "Subproc_Football"
runner: "RunnerFootball"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [128, ]
recurrent_hidden_size: 128
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
representation_hidden_size: [128, ]
q_hidden_size: [128, ] # the units for each hidden layer
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True
use_actions_mask: True
seed: 1
parallels: 50
buffer_size: 5000
batch_size: 32
learning_rate: 0.0007
gamma: 0.99 # discount factor
double_q: True # use double q learning
start_greedy: 1.0
end_greedy: 0.05
decay_step_greedy: 1000000
start_training: 1000 # start training after n steps
running_steps: 25000000 # 25M
training_frequency: 60
sync_frequency: 200
use_grad_clip: False
grad_clip_norm: 0.5
eval_interval: 250000
test_episode: 50
log_dir: "./logs/iql/"
model_dir: "./models/iql/"
videos_dir: "./videos/iql/"
agent: "IQL" # the learning algorithms_marl
env_name: "StarCraft2"
env_id: "2m_vs_1z"
env_seed: 1
fps: 15
learner: "IQL_Learner"
policy: "Basic_Q_network_marl"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
representation_hidden_size: [64, ]
q_hidden_size: [64, ] # the units for each hidden layer
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True
use_actions_mask: True
seed: 1
parallels: 8
buffer_size: 5000
batch_size: 32
learning_rate: 0.0007
gamma: 0.99 # discount factor
double_q: True # use double q learning
start_greedy: 1.0
end_greedy: 0.05
decay_step_greedy: 50000
start_training: 1000 # start training after n steps
running_steps: 1000000 # 1M
n_epochs: 8 # The number of training epochs after interaction.
sync_frequency: 200
use_grad_clip: False
grad_clip_norm: 0.5
eval_interval: 10000
test_episode: 16
log_dir: "./logs/iql/"
model_dir: "./models/iql/"
agent: "IQL" # the learning algorithms_marl
env_name: "StarCraft2"
env_id: "3m"
env_seed: 1
fps: 15
learner: "IQL_Learner"
policy: "Basic_Q_network_marl"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
representation_hidden_size: [64, ]
q_hidden_size: [64, ] # the units for each hidden layer
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True
use_actions_mask: True
seed: 1
parallels: 8
buffer_size: 5000
batch_size: 32
learning_rate: 0.0007
gamma: 0.99 # discount factor
double_q: True # use double q learning
start_greedy: 1.0
end_greedy: 0.05
decay_step_greedy: 50000
start_training: 1000 # start training after n steps
running_steps: 1000000 # 1M
n_epochs: 8 # The number of training epochs after interaction.
sync_frequency: 200
use_grad_clip: False
grad_clip_norm: 0.5
eval_interval: 10000
test_episode: 16
log_dir: "./logs/iql/"
model_dir: "./models/iql/"
agent: "IQL" # the learning algorithms_marl
env_name: "StarCraft2"
env_id: "8m"
env_seed: 1
fps: 15
learner: "IQL_Learner"
policy: "Basic_Q_network_marl"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
representation_hidden_size: [64, ]
q_hidden_size: [64, ] # the units for each hidden layer
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True
use_actions_mask: True
seed: 1
parallels: 8
buffer_size: 5000
batch_size: 32
learning_rate: 0.0007
gamma: 0.99 # discount factor
double_q: True # use double q learning
start_greedy: 1.0
end_greedy: 0.05
decay_step_greedy: 50000
start_training: 1000 # start training after n steps
running_steps: 1000000 # 1M
n_epochs: 8 # The number of training epochs after interaction.
sync_frequency: 200
use_grad_clip: False
grad_clip_norm: 0.5
eval_interval: 10000
test_episode: 16
log_dir: "./logs/iql/"
model_dir: "./models/iql/"
agent: "IQL" # the learning algorithms_marl
env_name: "StarCraft2"
env_id: "1c3s5z"
env_seed: 1
fps: 15
learner: "IQL_Learner"
policy: "Basic_Q_network_marl"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
representation_hidden_size: [64, ]
q_hidden_size: [64, ] # the units for each hidden layer
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True
use_actions_mask: True
seed: 1
parallels: 8
buffer_size: 5000
batch_size: 32
learning_rate: 0.0007
gamma: 0.99 # discount factor
double_q: True # use double q learning
start_greedy: 1.0
end_greedy: 0.05
decay_step_greedy: 50000
start_training: 1000 # start training after n steps
running_steps: 2000000 # 2M
n_epochs: 8 # The number of training epochs after interaction.
sync_frequency: 200
use_grad_clip: False
grad_clip_norm: 0.5
eval_interval: 20000
test_episode: 16
log_dir: "./logs/iql/"
model_dir: "./models/iql/"
agent: "IQL" # the learning algorithms_marl
env_name: "StarCraft2"
env_id: "2s3z"
env_seed: 1
fps: 15
learner: "IQL_Learner"
policy: "Basic_Q_network_marl"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
representation_hidden_size: [64, ]
q_hidden_size: [64, ] # the units for each hidden layer
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True
use_actions_mask: True
seed: 1
parallels: 8
buffer_size: 5000
batch_size: 32
learning_rate: 0.0007
gamma: 0.99 # discount factor
double_q: True # use double q learning
start_greedy: 1.0
end_greedy: 0.05
decay_step_greedy: 50000
start_training: 1000 # start training after n steps
running_steps: 2000000 # 2M
n_epochs: 8 # The number of training epochs after interaction.
sync_frequency: 200
use_grad_clip: False
grad_clip_norm: 0.5
eval_interval: 20000
test_episode: 16
log_dir: "./logs/iql/"
model_dir: "./models/iql/"
agent: "IQL" # the learning algorithms_marl
env_name: "StarCraft2"
env_id: "25m"
env_seed: 1
fps: 15
learner: "IQL_Learner"
policy: "Basic_Q_network_marl"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
representation_hidden_size: [64, ]
q_hidden_size: [64, ] # the units for each hidden layer
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True
use_actions_mask: True
seed: 1
parallels: 8
buffer_size: 5000
batch_size: 32
learning_rate: 0.0007
gamma: 0.99 # discount factor
double_q: True # use double q learning
start_greedy: 1.0
end_greedy: 0.05
decay_step_greedy: 50000
start_training: 1000 # start training after n steps
running_steps: 5000000 # 5M
n_epochs: 8 # The number of training epochs after interaction.
sync_frequency: 200
use_grad_clip: False
grad_clip_norm: 0.5
eval_interval: 50000
test_episode: 16
log_dir: "./logs/iql/"
model_dir: "./models/iql/"
agent: "IQL" # the learning algorithms_marl
env_name: "StarCraft2"
env_id: "5m_vs_6m"
env_seed: 1
fps: 15
learner: "IQL_Learner"
policy: "Basic_Q_network_marl"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
representation_hidden_size: [64, ]
q_hidden_size: [64, ] # the units for each hidden layer
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True
use_actions_mask: True
seed: 1
parallels: 8
buffer_size: 5000
batch_size: 32
learning_rate: 0.0007
gamma: 0.99 # discount factor
double_q: True # use double q learning
start_greedy: 1.0
end_greedy: 0.05
decay_step_greedy: 50000
start_training: 1000 # start training after n steps
running_steps: 10000000 # 10M
n_epochs: 8 # The number of training epochs after interaction.
sync_frequency: 200
use_grad_clip: False
grad_clip_norm: 0.5
eval_interval: 100000
test_episode: 16
log_dir: "./logs/iql/"
model_dir: "./models/iql/"
agent: "IQL" # the learning algorithms_marl
env_name: "StarCraft2"
env_id: "8m_vs_9m"
env_seed: 1
fps: 15
learner: "IQL_Learner"
policy: "Basic_Q_network_marl"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
representation_hidden_size: [64, ]
q_hidden_size: [64, ] # the units for each hidden layer
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True
use_actions_mask: True
seed: 1
parallels: 8
buffer_size: 5000
batch_size: 32
learning_rate: 0.0007
gamma: 0.99 # discount factor
double_q: True # use double q learning
start_greedy: 1.0
end_greedy: 0.05
decay_step_greedy: 50000
start_training: 1000 # start training after n steps
running_steps: 10000000 # 10M
n_epochs: 8 # The number of training epochs after interaction.
sync_frequency: 200
use_grad_clip: False
grad_clip_norm: 0.5
eval_interval: 100000
test_episode: 16
log_dir: "./logs/iql/"
model_dir: "./models/iql/"
agent: "IQL" # the learning algorithms_marl
env_name: "StarCraft2"
env_id: "MMM2"
env_seed: 1
fps: 15
learner: "IQL_Learner"
policy: "Basic_Q_network_marl"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
representation_hidden_size: [64, ]
q_hidden_size: [64, ] # the units for each hidden layer
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True
use_actions_mask: True
seed: 1
parallels: 8
buffer_size: 5000
batch_size: 32
learning_rate: 0.0007
gamma: 0.99 # discount factor
double_q: True # use double q learning
start_greedy: 1.0
end_greedy: 0.05
decay_step_greedy: 50000
start_training: 1000 # start training after n steps
running_steps: 10000000 # 10M
n_epochs: 8 # The number of training epochs after interaction.
sync_frequency: 200
use_grad_clip: False
grad_clip_norm: 0.5
eval_interval: 100000
test_episode: 16
log_dir: "./logs/iql/"
model_dir: "./models/iql/"
agent: "IQL" # the learning algorithms_marl
env_name: "StarCraft2"
env_id: "corridor"
env_seed: 1
fps: 15
learner: "IQL_Learner"
policy: "Basic_Q_network_marl"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
representation_hidden_size: [64, ]
q_hidden_size: [64, ] # the units for each hidden layer
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True
use_actions_mask: True
seed: 1
parallels: 8
buffer_size: 5000
batch_size: 32
learning_rate: 0.0007
gamma: 0.99 # discount factor
double_q: True # use double q learning
start_greedy: 1.0
end_greedy: 0.05
decay_step_greedy: 50000
start_training: 1000 # start training after n steps
running_steps: 10000000 # 10M
n_epochs: 8 # The number of training epochs after interaction.
sync_frequency: 200
use_grad_clip: False
grad_clip_norm: 0.5
eval_interval: 100000
test_episode: 16
log_dir: "./logs/iql/"
model_dir: "./models/iql/"
agent: "VDN" # the learning algorithms_marl
env_name: "mpe" # Name of the environment.
env_id: "simple_spread_v3"
env_seed: 1
continuous_action: False # Continuous action space or not.
learner: "VDN_Learner"
policy: "Mixing_Q_network"
representation: "Basic_MLP"
vectorize: "DummyVecMultiAgentEnv"
runner: "MARL" # Runner
use_rnn: False # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
representation_hidden_size: [64, ]
q_hidden_size: [64, ] # the units for each hidden layer
activation: "relu" # The activation function of each hidden layer.
seed: 1
parallels: 16
buffer_size: 100000
batch_size: 256
learning_rate: 0.001
gamma: 0.99 # discount factor
double_q: True # use double q learning
start_greedy: 1.0
end_greedy: 0.05
decay_step_greedy: 2500000
start_training: 1000 # start training after n steps
running_steps: 10000000 # 10M
training_frequency: 25
sync_frequency: 100
use_grad_clip: False
grad_clip_norm: 0.5
use_parameter_sharing: True
use_actions_mask: False
eval_interval: 100000
test_episode: 5
log_dir: "./logs/vdn/"
model_dir: "./models/vdn/"
agent: "VDN" # the learning algorithms_marl
env_name: "RoboticWarehouse"
env_id: "rware-tiny-2ag-v1"
env_seed: 1
max_episode_steps: 100
learner: "VDN_Learner"
policy: "Mixing_Q_network"
representation: "Basic_MLP"
vectorize: "Dummy_RoboticWarehouse"
runner: "MARL" # Runner
use_rnn: False # Whether to use recurrent neural networks.
rnn:
representation_hidden_size: [64, ]
q_hidden_size: [64, ] # the units for each hidden layer
activation: "relu" # The activation function of each hidden layer.
seed: 1
parallels: 16
buffer_size: 100000
batch_size: 256
learning_rate: 0.001
gamma: 0.99 # discount factor
double_q: True # use double q learning
start_greedy: 1.0
end_greedy: 0.05
decay_step_greedy: 2500000
start_training: 1000 # start training after n steps
running_steps: 10000000 # 10M
training_frequency: 1
sync_frequency: 100
use_grad_clip: False
grad_clip_norm: 0.5
use_parameter_sharing: True
use_actions_mask: True
eval_interval: 100000
test_episode: 5
log_dir: "./logs/vdn/"
model_dir: "./models/vdn/"
agent: "VDN" # the learning algorithms_marl
global_state: False
# environment settings
env_name: "Football"
scenario: "academy_3_vs_1_with_keeper"
env_seed: 1
use_stacked_frames: False # Whether to use stacked_frames
num_agent: 3
num_adversary: 0
obs_type: "simple115v2" # representation used to build the observation, choices: ["simple115v2", "extracted", "pixels_gray", "pixels"]
rewards_type: "scoring,checkpoints" # comma separated list of rewards to be added
smm_width: 96 # width of super minimap
smm_height: 72 # height of super minimap
fps: 15
learner: "VDN_Learner"
policy: "Mixing_Q_network"
representation: "Basic_RNN"
vectorize: "Dummy_Football"
runner: "RunnerFootball"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [128, ]
recurrent_hidden_size: 128
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
representation_hidden_size: [128, ]
q_hidden_size: [128, ] # the units for each hidden layer
activation: "relu" # The activation function of each hidden layer.
seed: 1
parallels: 50
buffer_size: 5000
batch_size: 32
learning_rate: 0.0007
gamma: 0.99 # discount factor
double_q: True # use double q learning
start_greedy: 1.0
end_greedy: 0.05
decay_step_greedy: 1000000
start_training: 1000 # start training after n steps
running_steps: 25000000 # 25M
training_frequency: 1
sync_frequency: 200
use_grad_clip: False
grad_clip_norm: 0.5
use_parameter_sharing: True
use_actions_mask: True
eval_interval: 250000
test_episode: 50
log_dir: "./logs/vdn/"
model_dir: "./models/vdn/"
videos_dir: "./videos/vdn/"
agent: "VDN" # the learning algorithms_marl
env_name: "StarCraft2"
env_id: "2m_vs_1z"
env_seed: 1
fps: 15
learner: "VDN_Learner"
policy: "Mixing_Q_network"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
representation_hidden_size: [64, ]
q_hidden_size: [64, ] # the units for each hidden layer
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True
use_actions_mask: True
seed: 1
parallels: 8
buffer_size: 5000
batch_size: 32
learning_rate: 0.0007
gamma: 0.99 # discount factor
double_q: True # use double q learning
start_greedy: 1.0
end_greedy: 0.05
decay_step_greedy: 50000
start_training: 1000 # start training after n steps
running_steps: 1000000 # 1M
n_epochs: 8 # The number of training epochs after interaction.
sync_frequency: 200
use_grad_clip: False
grad_clip_norm: 0.5
eval_interval: 10000
test_episode: 16
log_dir: "./logs/vdn/"
model_dir: "./models/vdn/"
agent: "VDN" # the learning algorithms_marl
env_name: "StarCraft2"
env_id: "3m"
env_seed: 1
fps: 15
learner: "VDN_Learner"
policy: "Mixing_Q_network"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
representation_hidden_size: [64, ]
q_hidden_size: [64, ] # the units for each hidden layer
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True
use_actions_mask: True
seed: 1
parallels: 8
buffer_size: 5000
batch_size: 32
learning_rate: 0.0007
gamma: 0.99 # discount factor
double_q: True # use double q learning
start_greedy: 1.0
end_greedy: 0.05
decay_step_greedy: 50000
start_training: 1000 # start training after n steps
running_steps: 1000000 # 1M
n_epochs: 8 # The number of training epochs after interaction.
sync_frequency: 200
use_grad_clip: False
grad_clip_norm: 0.5
eval_interval: 10000
test_episode: 16
log_dir: "./logs/vdn/"
model_dir: "./models/vdn/"
agent: "VDN" # the learning algorithms_marl
env_name: "StarCraft2"
env_id: "8m"
env_seed: 1
fps: 15
learner: "VDN_Learner"
policy: "Mixing_Q_network"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
representation_hidden_size: [64, ]
q_hidden_size: [64, ] # the units for each hidden layer
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True
use_actions_mask: True
seed: 1
parallels: 8
buffer_size: 5000
batch_size: 32
learning_rate: 0.0007
gamma: 0.99 # discount factor
double_q: True # use double q learning
start_greedy: 1.0
end_greedy: 0.05
decay_step_greedy: 50000
start_training: 1000 # start training after n steps
running_steps: 1000000 # 1M
n_epochs: 8 # The number of training epochs after interaction.
sync_frequency: 200
use_grad_clip: False
grad_clip_norm: 0.5
eval_interval: 10000
test_episode: 16
log_dir: "./logs/vdn/"
model_dir: "./models/vdn/"
agent: "VDN" # the learning algorithms_marl
env_name: "StarCraft2"
env_id: "1c3s5z"
env_seed: 1
fps: 15
learner: "VDN_Learner"
policy: "Mixing_Q_network"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
representation_hidden_size: [64, ]
q_hidden_size: [64, ] # the units for each hidden layer
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True
use_actions_mask: True
seed: 1
parallels: 8
buffer_size: 5000
batch_size: 32
learning_rate: 0.0007
gamma: 0.99 # discount factor
double_q: True # use double q learning
start_greedy: 1.0
end_greedy: 0.05
decay_step_greedy: 50000
start_training: 1000 # start training after n steps
running_steps: 2000000 # 2M
n_epochs: 8 # The number of training epochs after interaction.
sync_frequency: 200
use_grad_clip: False
grad_clip_norm: 0.5
eval_interval: 20000
test_episode: 16
log_dir: "./logs/vdn/"
model_dir: "./models/vdn/"
agent: "VDN" # the learning algorithms_marl
env_name: "StarCraft2"
env_id: "2s3z"
env_seed: 1
fps: 15
learner: "VDN_Learner"
policy: "Mixing_Q_network"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
representation_hidden_size: [64, ]
q_hidden_size: [64, ] # the units for each hidden layer
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True
use_actions_mask: True
seed: 1
parallels: 8
buffer_size: 5000
batch_size: 32
learning_rate: 0.0007
gamma: 0.99 # discount factor
double_q: True # use double q learning
start_greedy: 1.0
end_greedy: 0.05
decay_step_greedy: 50000
start_training: 1000 # start training after n steps
running_steps: 2000000 # 2M
n_epochs: 8 # The number of training epochs after interaction.
sync_frequency: 200
use_grad_clip: False
grad_clip_norm: 0.5
eval_interval: 20000
test_episode: 16
log_dir: "./logs/vdn/"
model_dir: "./models/vdn/"
agent: "VDN" # the learning algorithms_marl
env_name: "StarCraft2"
env_id: "25m"
env_seed: 1
fps: 15
learner: "VDN_Learner"
policy: "Mixing_Q_network"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
representation_hidden_size: [64, ]
q_hidden_size: [64, ] # the units for each hidden layer
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True
use_actions_mask: True
seed: 1
parallels: 8
buffer_size: 5000
batch_size: 32
learning_rate: 0.0007
gamma: 0.99 # discount factor
double_q: True # use double q learning
start_greedy: 1.0
end_greedy: 0.05
decay_step_greedy: 50000
start_training: 1000 # start training after n steps
running_steps: 5000000 # 5M
n_epochs: 8 # The number of training epochs after interaction.
sync_frequency: 200
use_grad_clip: False
grad_clip_norm: 0.5
eval_interval: 50000
test_episode: 16
log_dir: "./logs/vdn/"
model_dir: "./models/vdn/"
agent: "VDN" # the learning algorithms_marl
env_name: "StarCraft2"
env_id: "5m_vs_6m"
env_seed: 1
fps: 15
learner: "VDN_Learner"
policy: "Mixing_Q_network"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
representation_hidden_size: [64, ]
q_hidden_size: [64, ] # the units for each hidden layer
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True
use_actions_mask: True
seed: 1
parallels: 8
buffer_size: 5000
batch_size: 32
learning_rate: 0.0007
gamma: 0.99 # discount factor
double_q: True # use double q learning
start_greedy: 1.0
end_greedy: 0.05
decay_step_greedy: 50000
start_training: 1000 # start training after n steps
running_steps: 10000000 # 10M
n_epochs: 8 # The number of training epochs after interaction.
sync_frequency: 200
use_grad_clip: False
grad_clip_norm: 0.5
eval_interval: 100000
test_episode: 16
log_dir: "./logs/vdn/"
model_dir: "./models/vdn/"
agent: "VDN" # the learning algorithms_marl
env_name: "StarCraft2"
env_id: "8m_vs_9m"
env_seed: 1
fps: 15
learner: "VDN_Learner"
policy: "Mixing_Q_network"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
representation_hidden_size: [64, ]
q_hidden_size: [64, ] # the units for each hidden layer
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True
use_actions_mask: True
seed: 1
parallels: 8
buffer_size: 5000
batch_size: 32
learning_rate: 0.0007
gamma: 0.99 # discount factor
double_q: True # use double q learning
start_greedy: 1.0
end_greedy: 0.05
decay_step_greedy: 50000
start_training: 1000 # start training after n steps
running_steps: 10000000 # 10M
n_epochs: 8 # The number of training epochs after interaction.
sync_frequency: 200
use_grad_clip: False
grad_clip_norm: 0.5
eval_interval: 100000
test_episode: 16
log_dir: "./logs/vdn/"
model_dir: "./models/vdn/"
agent: "VDN" # the learning algorithms_marl
env_name: "StarCraft2"
env_id: "MMM2"
env_seed: 1
fps: 15
learner: "VDN_Learner"
policy: "Mixing_Q_network"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
representation_hidden_size: [64, ]
q_hidden_size: [64, ] # the units for each hidden layer
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True
use_actions_mask: True
seed: 1
parallels: 8
buffer_size: 5000
batch_size: 32
learning_rate: 0.0007
gamma: 0.99 # discount factor
double_q: True # use double q learning
start_greedy: 1.0
end_greedy: 0.05
decay_step_greedy: 50000
start_training: 1000 # start training after n steps
running_steps: 10000000 # 1M
n_epochs: 8 # The number of training epochs after interaction.
sync_frequency: 200
use_grad_clip: False
grad_clip_norm: 0.5
eval_interval: 100000
test_episode: 16
log_dir: "./logs/vdn/"
model_dir: "./models/vdn/"
agent: "VDN" # the learning algorithms_marl
env_name: "StarCraft2"
env_id: "corridor"
env_seed: 1
fps: 15
learner: "VDN_Learner"
policy: "Mixing_Q_network"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
representation_hidden_size: [64, ]
q_hidden_size: [64, ] # the units for each hidden layer
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True
use_actions_mask: True
seed: 1
parallels: 8
buffer_size: 5000
batch_size: 32
learning_rate: 0.0007
gamma: 0.99 # discount factor
double_q: True # use double q learning
start_greedy: 1.0
end_greedy: 0.05
decay_step_greedy: 50000
start_training: 1000 # start training after n steps
running_steps: 10000000 # 10M
n_epochs: 8 # The number of training epochs after interaction.
sync_frequency: 200
use_grad_clip: False
grad_clip_norm: 0.5
eval_interval: 100000
test_episode: 16
log_dir: "./logs/vdn/"
model_dir: "./models/vdn/"
agent: "VDN" # the learning algorithms_marl
env_name: "NewEnv_MAS"
env_id: "scenarios_1"
env_seed: 1
max_episode_steps: 200
render: False
sleep: 0.01
continuous_action: False # Continuous action space or not.
learner: "VDN_Learner"
policy: "Mixing_Q_network"
representation: "Basic_MLP"
vectorize: "Dummy_NewEnv_MAS"
runner: "MARL" # Runner
use_rnn: False # Whether to use recurrent neural networks.
rnn:
representation_hidden_size: [64, ]
q_hidden_size: [64, ] # the units for each hidden layer
activation: "relu" # The activation function of each hidden layer.
seed: 1
parallels: 16
buffer_size: 100000
batch_size: 256
learning_rate: 0.001
gamma: 0.99 # discount factor
double_q: True # use double q learning
start_greedy: 1.0
end_greedy: 0.05
decay_step_greedy: 2500000
start_training: 1000 # start training after n steps
running_steps: 10000000 # 10M
training_frequency: 1
sync_frequency: 100
use_grad_clip: False
grad_clip_norm: 0.5
use_parameter_sharing: True
use_actions_mask: False
eval_interval: 100000
test_episode: 5
log_dir: "./logs/vdn/"
model_dir: "./models/vdn/"
agent: "QMIX" # the learning algorithms_marl
env_name: "mpe" # Name of the environment.
env_id: "simple_spread_v3"
env_seed: 1
continuous_action: False # Continuous action space or not.
learner: "QMIX_Learner"
policy: "Mixing_Q_network"
representation: "Basic_MLP"
vectorize: "DummyVecMultiAgentEnv"
runner: "MARL" # Runner
use_rnn: False # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
representation_hidden_size: [64, ]
q_hidden_size: [64, ] # the units for each hidden layer
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: False
use_actions_mask: False
hidden_dim_mixing_net: 128 # hidden units of mixing network
hidden_dim_hyper_net: 128 # hidden units of hyper network
seed: 1
parallels: 16
buffer_size: 100000
batch_size: 256
learning_rate: 0.001
gamma: 0.99 # discount factor
double_q: True # use double q learning
start_greedy: 1.0
end_greedy: 0.05
decay_step_greedy: 2500000
start_training: 1000 # start training after n steps
running_steps: 10000000 # 10M
training_frequency: 25
sync_frequency: 100
use_grad_clip: False
grad_clip_norm: 0.5
eval_interval: 100000
test_episode: 5
log_dir: "./logs/qmix/"
model_dir: "./models/qmix/"
agent: "QMIX" # the learning algorithms_marl
env_name: "RoboticWarehouse"
env_id: "rware-tiny-2ag-v1"
env_seed: 1
max_episode_steps: 100
learner: "QMIX_Learner"
policy: "Mixing_Q_network"
representation: "Basic_MLP"
vectorize: "Dummy_RoboticWarehouse"
runner: "MARL" # Runner
use_rnn: False # Whether to use recurrent neural networks.
rnn:
representation_hidden_size: [64, ]
q_hidden_size: [128, ] # the units for each hidden layer
activation: "relu" # The activation function of each hidden layer.
hidden_dim_mixing_net: 64 # hidden units of mixing network
hidden_dim_hyper_net: 64 # hidden units of hyper network
seed: 1
parallels: 16
buffer_size: 100000
batch_size: 256
learning_rate: 0.001
gamma: 0.99 # discount factor
double_q: True # use double q learning
start_greedy: 1.0
end_greedy: 0.05
decay_step_greedy: 2500000
start_training: 1000 # start training after n steps
running_steps: 10000000 # 10M
training_frequency: 1
sync_frequency: 100
use_grad_clip: False
grad_clip_norm: 0.5
use_parameter_sharing: True
use_actions_mask: False
eval_interval: 100000
test_episode: 5
log_dir: "./logs/qmix/"
model_dir: "./models/qmix/"
agent: "QMIX" # the learning algorithms_marl
env_name: "Football"
scenario: "academy_3_vs_1_with_keeper"
env_seed: 1
use_stacked_frames: False # Whether to use stacked_frames
num_agent: 3
num_adversary: 0
obs_type: "simple115v2" # representation used to build the observation, choices: ["simple115v2", "extracted", "pixels_gray", "pixels"]
rewards_type: "scoring,checkpoints" # comma separated list of rewards to be added
smm_width: 96 # width of super minimap
smm_height: 72 # height of super minimap
fps: 15
learner: "QMIX_Learner"
policy: "Mixing_Q_network"
representation: "Basic_RNN"
vectorize: "Subproc_Football"
runner: "RunnerFootball"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [128, ]
recurrent_hidden_size: 128
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
representation_hidden_size: [128, ]
q_hidden_size: [128, ] # the units for each hidden layer
activation: "relu" # The activation function of each hidden layer.
hidden_dim_mixing_net: 128 # hidden units of mixing network
hidden_dim_hyper_net: 128 # hidden units of hyper network
seed: 1
parallels: 50
buffer_size: 5000
batch_size: 32
learning_rate: 0.0007
gamma: 0.99 # discount factor
double_q: True # use double q learning
start_greedy: 1.0
end_greedy: 0.05
decay_step_greedy: 50000
start_training: 1000 # start training after n steps
running_steps: 25000000 # 25M
training_frequency: 1
sync_frequency: 200
use_grad_clip: False
grad_clip_norm: 0.5
use_parameter_sharing: True
use_actions_mask: False
eval_interval: 250000
test_episode: 50
log_dir: "./logs/qmix/"
model_dir: "./models/qmix/"
videos_dir: "./videos/qmix/"
agent: "QMIX" # the learning algorithms_marl
env_name: "StarCraft2"
env_id: "2m_vs_1z"
env_seed: 1
fps: 15
learner: "QMIX_Learner"
policy: "Mixing_Q_network"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
representation_hidden_size: [64, ]
q_hidden_size: [64, ] # the units for each hidden layer
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True
use_actions_mask: True
hidden_dim_mixing_net: 32 # hidden units of mixing network
hidden_dim_hyper_net: 32 # hidden units of hyper network
seed: 1
parallels: 8
buffer_size: 5000
batch_size: 32
learning_rate: 0.0007
gamma: 0.99 # discount factor
double_q: True # use double q learning
start_greedy: 1.0
end_greedy: 0.05
decay_step_greedy: 50000
start_training: 1000 # start training after n steps
running_steps: 1000000 # 1M
n_epochs: 8 # The number of training epochs after interaction.
sync_frequency: 200
use_grad_clip: False
grad_clip_norm: 0.5
eval_interval: 10000
test_episode: 16
log_dir: "./logs/qmix/"
model_dir: "./models/qmix/"
agent: "QMIX" # the learning algorithms_marl
env_name: "StarCraft2"
env_id: "3m"
env_seed: 1
fps: 15
learner: "QMIX_Learner"
policy: "Mixing_Q_network"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
representation_hidden_size: [64, ]
q_hidden_size: [64, ] # the units for each hidden layer
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True
use_actions_mask: True
hidden_dim_mixing_net: 32 # hidden units of mixing network
hidden_dim_hyper_net: 32 # hidden units of hyper network
seed: 1
parallels: 8
buffer_size: 5000
batch_size: 32
learning_rate: 0.0007
gamma: 0.99 # discount factor
double_q: True # use double q learning
start_greedy: 1.0
end_greedy: 0.05
decay_step_greedy: 50000
start_training: 1000 # start training after n steps
running_steps: 1000000 # 1M
n_epochs: 8 # The number of training epochs after interaction.
sync_frequency: 200
use_grad_clip: False
grad_clip_norm: 0.5
eval_interval: 10000
test_episode: 16
log_dir: "./logs/qmix/"
model_dir: "./models/qmix/"
agent: "QMIX" # the learning algorithms_marl
env_name: "StarCraft2"
env_id: "8m"
env_seed: 1
fps: 15
learner: "QMIX_Learner"
policy: "Mixing_Q_network"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
representation_hidden_size: [64, ]
q_hidden_size: [64, ] # the units for each hidden layer
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True
use_actions_mask: True
hidden_dim_mixing_net: 32 # hidden units of mixing network
hidden_dim_hyper_net: 32 # hidden units of hyper network
seed: 1
parallels: 8
buffer_size: 5000
batch_size: 32
learning_rate: 0.0007
gamma: 0.99 # discount factor
double_q: True # use double q learning
start_greedy: 1.0
end_greedy: 0.05
decay_step_greedy: 500000
start_training: 1000 # start training after n steps
running_steps: 1000000 # 1M
n_epochs: 8 # The number of training epochs after interaction.
sync_frequency: 200
use_grad_clip: False
grad_clip_norm: 0.5
eval_interval: 10000
test_episode: 16
log_dir: "./logs/qmix/"
model_dir: "./models/qmix/"
agent: "QMIX" # the learning algorithms_marl
env_name: "StarCraft2"
env_id: "1c3s5z"
env_seed: 1
fps: 15
learner: "QMIX_Learner"
policy: "Mixing_Q_network"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
representation_hidden_size: [64, ]
q_hidden_size: [64, ] # the units for each hidden layer
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True
use_actions_mask: True
hidden_dim_mixing_net: 32 # hidden units of mixing network
hidden_dim_hyper_net: 32 # hidden units of hyper network
seed: 1
parallels: 8
buffer_size: 5000
batch_size: 32
learning_rate: 0.0007
gamma: 0.99 # discount factor
double_q: True # use double q learning
start_greedy: 1.0
end_greedy: 0.05
decay_step_greedy: 50000
start_training: 1000 # start training after n steps
running_steps: 2000000 # 2M
n_epochs: 8 # The number of training epochs after interaction.
sync_frequency: 200
use_grad_clip: False
grad_clip_norm: 0.5
eval_interval: 20000
test_episode: 16
log_dir: "./logs/qmix/"
model_dir: "./models/qmix/"
agent: "QMIX" # the learning algorithms_marl
env_name: "StarCraft2"
env_id: "2s3z"
env_seed: 1
fps: 15
learner: "QMIX_Learner"
policy: "Mixing_Q_network"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
representation_hidden_size: [64, ]
q_hidden_size: [64, ] # the units for each hidden layer
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True
use_actions_mask: True
hidden_dim_mixing_net: 32 # hidden units of mixing network
hidden_dim_hyper_net: 32 # hidden units of hyper network
seed: 1
parallels: 8
buffer_size: 5000
batch_size: 32
learning_rate: 0.0007
gamma: 0.99 # discount factor
double_q: True # use double q learning
start_greedy: 1.0
end_greedy: 0.05
decay_step_greedy: 50000
start_training: 1000 # start training after n steps
running_steps: 2000000 # 2M
n_epochs: 8 # The number of training epochs after interaction.
sync_frequency: 200
use_grad_clip: False
grad_clip_norm: 0.5
eval_interval: 20000
test_episode: 16
log_dir: "./logs/qmix/"
model_dir: "./models/qmix/"
agent: "QMIX" # the learning algorithms_marl
env_name: "StarCraft2"
env_id: "25m"
env_seed: 1
fps: 15
learner: "QMIX_Learner"
policy: "Mixing_Q_network"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
representation_hidden_size: [64, ]
q_hidden_size: [64, ] # the units for each hidden layer
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True
use_actions_mask: True
hidden_dim_mixing_net: 32 # hidden units of mixing network
hidden_dim_hyper_net: 32 # hidden units of hyper network
seed: 1
parallels: 8
buffer_size: 5000
batch_size: 32
learning_rate: 0.0007
gamma: 0.99 # discount factor
double_q: True # use double q learning
start_greedy: 1.0
end_greedy: 0.05
decay_step_greedy: 1000000
start_training: 1000 # start training after n steps
running_steps: 5000000 # 5M
n_epochs: 8 # The number of training epochs after interaction.
sync_frequency: 200
use_grad_clip: False
grad_clip_norm: 0.5
eval_interval: 50000
test_episode: 16
log_dir: "./logs/qmix/"
model_dir: "./models/qmix/"
agent: "QMIX" # the learning algorithms_marl
env_name: "StarCraft2"
env_id: "5m_vs_6m"
env_seed: 1
fps: 15
learner: "QMIX_Learner"
policy: "Mixing_Q_network"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
representation_hidden_size: [64, ]
q_hidden_size: [64, ] # the units for each hidden layer
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True
use_actions_mask: True
hidden_dim_mixing_net: 32 # hidden units of mixing network
hidden_dim_hyper_net: 32 # hidden units of hyper network
seed: 1
parallels: 8
buffer_size: 5000
batch_size: 32
learning_rate: 0.0007
gamma: 0.99 # discount factor
double_q: True # use double q learning
start_greedy: 1.0
end_greedy: 0.05
decay_step_greedy: 5000000
start_training: 1000 # start training after n steps
running_steps: 10000000 # 10M
n_epochs: 8 # The number of training epochs after interaction.
sync_frequency: 200
use_grad_clip: False
grad_clip_norm: 0.5
eval_interval: 100000
test_episode: 16
log_dir: "./logs/qmix/"
model_dir: "./models/qmix/"
agent: "QMIX" # the learning algorithms_marl
env_name: "StarCraft2"
env_id: "8m_vs_9m"
env_seed: 1
fps: 15
learner: "QMIX_Learner"
policy: "Mixing_Q_network"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
representation_hidden_size: [64, ]
q_hidden_size: [64, ] # the units for each hidden layer
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True
use_actions_mask: True
hidden_dim_mixing_net: 32 # hidden units of mixing network
hidden_dim_hyper_net: 32 # hidden units of hyper network
seed: 1
parallels: 8
buffer_size: 5000
batch_size: 32
learning_rate: 0.0007
gamma: 0.99 # discount factor
double_q: True # use double q learning
start_greedy: 1.0
end_greedy: 0.05
decay_step_greedy: 5000000
start_training: 1000 # start training after n steps
running_steps: 10000000 # 10M
n_epochs: 8 # The number of training epochs after interaction.
sync_frequency: 200
use_grad_clip: False
grad_clip_norm: 0.5
eval_interval: 100000
test_episode: 16
log_dir: "./logs/qmix/"
model_dir: "./models/qmix/"
agent: "QMIX" # the learning algorithms_marl
env_name: "StarCraft2"
env_id: "MMM2"
env_seed: 1
fps: 15
learner: "QMIX_Learner"
policy: "Mixing_Q_network"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
representation_hidden_size: [64, ]
q_hidden_size: [64, ] # the units for each hidden layer
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True
use_actions_mask: True
hidden_dim_mixing_net: 32 # hidden units of mixing network
hidden_dim_hyper_net: 32 # hidden units of hyper network
seed: 1
parallels: 8
buffer_size: 5000
batch_size: 32
learning_rate: 0.0007
gamma: 0.99 # discount factor
double_q: True # use double q learning
start_greedy: 1.0
end_greedy: 0.05
decay_step_greedy: 1000000
start_training: 1000 # start training after n steps
running_steps: 10000000 # 10M
n_epochs: 8 # The number of training epochs after interaction.
sync_frequency: 200
use_grad_clip: False
grad_clip_norm: 0.5
eval_interval: 100000
test_episode: 16
log_dir: "./logs/qmix/"
model_dir: "./models/qmix/"
agent: "QMIX" # the learning algorithms_marl
env_name: "StarCraft2"
env_id: "corridor"
env_seed: 1
fps: 15
learner: "QMIX_Learner"
policy: "Mixing_Q_network"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
representation_hidden_size: [64, ]
q_hidden_size: [64, ] # the units for each hidden layer
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True
use_actions_mask: True
hidden_dim_mixing_net: 32 # hidden units of mixing network
hidden_dim_hyper_net: 32 # hidden units of hyper network
seed: 1
parallels: 8
buffer_size: 5000
batch_size: 32
learning_rate: 0.0007
gamma: 0.99 # discount factor
double_q: True # use double q learning
start_greedy: 1.0
end_greedy: 0.05
decay_step_greedy: 1000000
start_training: 1000 # start training after n steps
running_steps: 10000000 # 10M
n_epochs: 8 # The number of training epochs after interaction.
sync_frequency: 200
use_grad_clip: False
grad_clip_norm: 0.5
eval_interval: 100000
test_episode: 16
log_dir: "./logs/qmix/"
model_dir: "./models/qmix/"
agent: "OWQMIX" # choice: CWQMIX, OWQMIX
env_name: "mpe" # Name of the environment.
env_id: "simple_spread_v3"
env_seed: 1
continuous_action: False # Continuous action space or not.
learner: "WQMIX_Learner"
policy: "Weighted_Mixing_Q_network"
representation: "Basic_MLP"
vectorize: "DummyVecMultiAgentEnv"
runner: "MARL" # Runner
use_rnn: False # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
representation_hidden_size: [128, ] # for Basic_MLP representation
q_hidden_size: [128, ] # the units for each hidden layer
activation: "relu" # The activation function of each hidden layer.
alpha: 0.1
hidden_dim_mixing_net: 32 # hidden units of mixing network
hidden_dim_hyper_net: 64 # hidden units of hyper network
hidden_dim_ff_mix_net: 256 # hidden units of mixing network
seed: 1
parallels: 16
buffer_size: 100000
batch_size: 256
learning_rate: 0.001
gamma: 0.99 # discount factor
double_q: True # use double q learning
start_greedy: 1.0
end_greedy: 0.05
decay_step_greedy: 5000000
start_training: 1000 # start training after n steps
running_steps: 10000000 # 10M
training_frequency: 25
sync_frequency: 200
use_grad_clip: False
grad_clip_norm: 0.5
use_parameter_sharing: True
use_actions_mask: False
eval_interval: 100000
test_episode: 5
log_dir: "./logs/wqmix/"
model_dir: "./models/wqmix/"
agent: "OWQMIX" # choice: CWQMIX, OWQMIX
env_name: "StarCraft2"
env_id: "2m_vs_1z"
env_seed: 1
fps: 15
learner: "WQMIX_Learner"
policy: "Weighted_Mixing_Q_network"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
representation_hidden_size: [64, ]
q_hidden_size: [64, ] # the units for each hidden layer
activation: "relu" # The activation function of each hidden layer.
alpha: 0.1
use_parameter_sharing: True
use_actions_mask: True
hidden_dim_mixing_net: 32 # hidden units of mixing network
hidden_dim_hyper_net: 64 # hidden units of hyper network
hidden_dim_ff_mix_net: 256 # hidden units of mixing network
seed: 1
parallels: 8
buffer_size: 5000
batch_size: 32
learning_rate: 0.0007
gamma: 0.99 # discount factor
double_q: True # use double q learning
start_greedy: 1.0
end_greedy: 0.05
decay_step_greedy: 50000
start_training: 1000 # start training after n steps
running_steps: 1000000 # 1M
n_epochs: 8 # The number of training epochs after interaction.
sync_frequency: 200
use_grad_clip: False
grad_clip_norm: 0.5
eval_interval: 10000
test_episode: 16
log_dir: "./logs/wqmix/"
model_dir: "./models/wqmix/"
agent: "OWQMIX" # choice: CWQMIX, OWQMIX
env_name: "StarCraft2"
env_id: "3m"
env_seed: 1
fps: 15
learner: "WQMIX_Learner"
policy: "Weighted_Mixing_Q_network"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
representation_hidden_size: [64, ]
q_hidden_size: [64, ] # the units for each hidden layer
activation: "relu" # The activation function of each hidden layer.
alpha: 0.1
use_parameter_sharing: True
use_actions_mask: True
hidden_dim_mixing_net: 32 # hidden units of mixing network
hidden_dim_hyper_net: 64 # hidden units of hyper network
hidden_dim_ff_mix_net: 256 # hidden units of mixing network
seed: 1
parallels: 8
buffer_size: 5000
batch_size: 32
learning_rate: 0.0007
gamma: 0.99 # discount factor
double_q: True # use double q learning
start_greedy: 1.0
end_greedy: 0.05
decay_step_greedy: 50000
start_training: 1000 # start training after n steps
running_steps: 1000000 # 1M
n_epochs: 8 # The number of training epochs after interaction.
sync_frequency: 200
use_grad_clip: False
grad_clip_norm: 0.5
eval_interval: 10000
test_episode: 16
log_dir: "./logs/wqmix/"
model_dir: "./models/wqmix/"
agent: "OWQMIX" # choice: CWQMIX, OWQMIX
env_name: "StarCraft2"
env_id: "8m"
env_seed: 1
fps: 15
learner: "WQMIX_Learner"
policy: "Weighted_Mixing_Q_network"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
representation_hidden_size: [64, ]
q_hidden_size: [64, ] # the units for each hidden layer
activation: "relu" # The activation function of each hidden layer.
alpha: 0.1
use_parameter_sharing: True
use_actions_mask: True
hidden_dim_mixing_net: 32 # hidden units of mixing network
hidden_dim_hyper_net: 64 # hidden units of hyper network
hidden_dim_ff_mix_net: 256 # hidden units of mixing network
seed: 1
parallels: 8
buffer_size: 5000
batch_size: 32
learning_rate: 0.0007
gamma: 0.99 # discount factor
double_q: True # use double q learning
start_greedy: 1.0
end_greedy: 0.05
decay_step_greedy: 50000
start_training: 1000 # start training after n steps
running_steps: 1000000 # 1M
n_epochs: 8 # The number of training epochs after interaction.
sync_frequency: 200
use_grad_clip: False
grad_clip_norm: 0.5
eval_interval: 10000
test_episode: 16
log_dir: "./logs/wqmix/"
model_dir: "./models/wqmix/"
agent: "OWQMIX" # choice: CWQMIX, OWQMIX
env_name: "StarCraft2"
env_id: "1c3s5z"
env_seed: 1
fps: 15
learner: "WQMIX_Learner"
policy: "Weighted_Mixing_Q_network"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
representation_hidden_size: [64, ]
q_hidden_size: [64, ] # the units for each hidden layer
activation: "relu" # The activation function of each hidden layer.
alpha: 0.1
use_parameter_sharing: True
use_actions_mask: True
hidden_dim_mixing_net: 32 # hidden units of mixing network
hidden_dim_hyper_net: 64 # hidden units of hyper network
hidden_dim_ff_mix_net: 256 # hidden units of mixing network
seed: 1
parallels: 8
buffer_size: 5000
batch_size: 32
learning_rate: 0.0007
gamma: 0.99 # discount factor
double_q: True # use double q learning
start_greedy: 1.0
end_greedy: 0.05
decay_step_greedy: 50000
start_training: 1000 # start training after n steps
running_steps: 2000000 # 2M
n_epochs: 8 # The number of training epochs after interaction.
sync_frequency: 200
use_grad_clip: False
grad_clip_norm: 0.5
eval_interval: 20000
test_episode: 16
log_dir: "./logs/wqmix/"
model_dir: "./models/wqmix/"
agent: "OWQMIX" # choice: CWQMIX, OWQMIX
env_name: "StarCraft2"
env_id: "2s3z"
env_seed: 1
fps: 15
learner: "WQMIX_Learner"
policy: "Weighted_Mixing_Q_network"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
representation_hidden_size: [64, ]
q_hidden_size: [64, ] # the units for each hidden layer
activation: "relu" # The activation function of each hidden layer.
alpha: 0.1
use_parameter_sharing: True
use_actions_mask: True
hidden_dim_mixing_net: 32 # hidden units of mixing network
hidden_dim_hyper_net: 64 # hidden units of hyper network
hidden_dim_ff_mix_net: 256 # hidden units of mixing network
seed: 1
parallels: 8
buffer_size: 5000
batch_size: 32
learning_rate: 0.0007
gamma: 0.99 # discount factor
double_q: True # use double q learning
start_greedy: 1.0
end_greedy: 0.05
decay_step_greedy: 50000
start_training: 1000 # start training after n steps
running_steps: 2000000 # 2M
n_epochs: 8 # The number of training epochs after interaction.
sync_frequency: 200
use_grad_clip: False
grad_clip_norm: 0.5
eval_interval: 20000
test_episode: 16
log_dir: "./logs/wqmix/"
model_dir: "./models/wqmix/"
agent: "OWQMIX" # choice: CWQMIX, OWQMIX
env_name: "StarCraft2"
env_id: "25m"
env_seed: 1
fps: 15
learner: "WQMIX_Learner"
policy: "Weighted_Mixing_Q_network"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
representation_hidden_size: [64, ]
q_hidden_size: [64, ] # the units for each hidden layer
activation: "relu" # The activation function of each hidden layer.
alpha: 0.1
use_parameter_sharing: True
use_actions_mask: True
hidden_dim_mixing_net: 32 # hidden units of mixing network
hidden_dim_hyper_net: 64 # hidden units of hyper network
hidden_dim_ff_mix_net: 256 # hidden units of mixing network
seed: 1
parallels: 8
buffer_size: 5000
batch_size: 32
learning_rate: 0.0007
gamma: 0.99 # discount factor
double_q: True # use double q learning
start_greedy: 1.0
end_greedy: 0.05
decay_step_greedy: 1000000
start_training: 1000 # start training after n steps
running_steps: 5000000 # 5M
n_epochs: 8 # The number of training epochs after interaction.
sync_frequency: 200
use_grad_clip: False
grad_clip_norm: 0.5
eval_interval: 50000
test_episode: 16
log_dir: "./logs/wqmix/"
model_dir: "./models/wqmix/"
agent: "OWQMIX" # choice: CWQMIX, OWQMIX
env_name: "StarCraft2"
env_id: "5m_vs_6m"
env_seed: 1
fps: 15
learner: "WQMIX_Learner"
policy: "Weighted_Mixing_Q_network"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
representation_hidden_size: [64, ]
q_hidden_size: [64, ] # the units for each hidden layer
activation: "relu" # The activation function of each hidden layer.
alpha: 0.1
use_parameter_sharing: True
use_actions_mask: True
hidden_dim_mixing_net: 32 # hidden units of mixing network
hidden_dim_hyper_net: 64 # hidden units of hyper network
hidden_dim_ff_mix_net: 256 # hidden units of mixing network
seed: 1
parallels: 8
buffer_size: 5000
batch_size: 32
learning_rate: 0.0007
gamma: 0.99 # discount factor
double_q: True # use double q learning
start_greedy: 1.0
end_greedy: 0.05
decay_step_greedy: 1000000
start_training: 1000 # start training after n steps
running_steps: 10000000 # 10M
n_epochs: 8 # The number of training epochs after interaction.
sync_frequency: 200
use_grad_clip: False
grad_clip_norm: 0.5
eval_interval: 100000
test_episode: 16
log_dir: "./logs/wqmix/"
model_dir: "./models/wqmix/"
agent: "OWQMIX" # choice: CWQMIX, OWQMIX
env_name: "StarCraft2"
env_id: "8m_vs_9m"
env_seed: 1
fps: 15
learner: "WQMIX_Learner"
policy: "Weighted_Mixing_Q_network"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
representation_hidden_size: [64, ]
q_hidden_size: [64, ] # the units for each hidden layer
activation: "relu" # The activation function of each hidden layer.
alpha: 0.1
use_parameter_sharing: True
use_actions_mask: True
hidden_dim_mixing_net: 32 # hidden units of mixing network
hidden_dim_hyper_net: 64 # hidden units of hyper network
hidden_dim_ff_mix_net: 256 # hidden units of mixing network
seed: 1
parallels: 8
buffer_size: 5000
batch_size: 32
learning_rate: 0.0007
gamma: 0.99 # discount factor
double_q: True # use double q learning
start_greedy: 1.0
end_greedy: 0.05
decay_step_greedy: 1000000
start_training: 1000 # start training after n steps
running_steps: 10000000 # 10M
n_epochs: 8 # The number of training epochs after interaction.
sync_frequency: 200
use_grad_clip: False
grad_clip_norm: 0.5
eval_interval: 100000
test_episode: 16
log_dir: "./logs/wqmix/"
model_dir: "./models/wqmix/"
agent: "OWQMIX" # choice: CWQMIX, OWQMIX
env_name: "StarCraft2"
env_id: "3m"
env_seed: 1
fps: 15
learner: "WQMIX_Learner"
policy: "Weighted_Mixing_Q_network"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
representation_hidden_size: [64, ]
q_hidden_size: [64, ] # the units for each hidden layer
activation: "relu" # The activation function of each hidden layer.
alpha: 0.1
use_parameter_sharing: True
use_actions_mask: True
hidden_dim_mixing_net: 32 # hidden units of mixing network
hidden_dim_hyper_net: 64 # hidden units of hyper network
hidden_dim_ff_mix_net: 256 # hidden units of mixing network
seed: 1
parallels: 8
buffer_size: 5000
batch_size: 32
learning_rate: 0.0007
gamma: 0.99 # discount factor
double_q: True # use double q learning
start_greedy: 1.0
end_greedy: 0.05
decay_step_greedy: 1000000
start_training: 1000 # start training after n steps
running_steps: 10000000 # 10M
n_epochs: 8 # The number of training epochs after interaction.
sync_frequency: 200
use_grad_clip: False
grad_clip_norm: 0.5
eval_interval: 100000
test_episode: 16
log_dir: "./logs/wqmix/"
model_dir: "./models/wqmix/"
agent: "OWQMIX" # choice: CWQMIX, OWQMIX
env_name: "StarCraft2"
env_id: "corridor"
env_seed: 1
fps: 15
learner: "WQMIX_Learner"
policy: "Weighted_Mixing_Q_network"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
representation_hidden_size: [64, ]
q_hidden_size: [64, ] # the units for each hidden layer
activation: "relu" # The activation function of each hidden layer.
alpha: 0.1
use_parameter_sharing: True
use_actions_mask: True
hidden_dim_mixing_net: 32 # hidden units of mixing network
hidden_dim_hyper_net: 64 # hidden units of hyper network
hidden_dim_ff_mix_net: 256 # hidden units of mixing network
seed: 1
parallels: 8
buffer_size: 5000
batch_size: 32
learning_rate: 0.0007
gamma: 0.99 # discount factor
double_q: True # use double q learning
start_greedy: 1.0
end_greedy: 0.05
decay_step_greedy: 1000000
start_training: 1000 # start training after n steps
running_steps: 10000000 # 10M
n_epochs: 8 # The number of training epochs after interaction.
sync_frequency: 200
use_grad_clip: False
grad_clip_norm: 0.5
eval_interval: 100000
test_episode: 16
log_dir: "./logs/wqmix/"
model_dir: "./models/wqmix/"
agent: "DCG" # Options: DCG, DCG_S
env_name: "mpe" # Name of the environment.
env_id: "simple_spread_v3"
env_seed: 1
continuous_action: False # Continuous action space or not.
learner: "DCG_Learner"
policy: "DCG_Policy"
representation: "Basic_MLP"
vectorize: "DummyVecMultiAgentEnv"
runner: "MARL" # Runner
use_rnn: False # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 32
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
representation_hidden_size: [32, ]
q_hidden_size: [128, ] # the units for each hidden layer
hidden_utility_dim: 256 # hidden units of the utility function
hidden_payoff_dim: 256 # hidden units of the payoff function
bias_net: "Basic_MLP"
hidden_bias_dim: [256, ] # hidden units of the bias network with global states as input
activation: "relu" # The activation function of each hidden layer.
low_rank_payoff: False # low-rank approximation of payoff function
payoff_rank: 5 # the rank K in the paper
graph_type: "FULL" # specific type of the coordination graph
n_msg_iterations: 1 # number of iterations for message passing during belief propagation
msg_normalized: True # Message normalization during greedy action selection (Kok and Vlassis, 2006)
seed: 1
parallels: 16
buffer_size: 100000
batch_size: 256
learning_rate: 0.001
gamma: 0.95 # discount factor
double_q: True # use double q learning
start_greedy: 1.0
end_greedy: 0.05
decay_step_greedy: 2500000
start_training: 1000 # start training after n steps
running_steps: 10000000 # 10M
training_frequency: 25
sync_frequency: 200
use_grad_clip: False
grad_clip_norm: 0.5
use_parameter_sharing: True
use_actions_mask: False
eval_interval: 100000
test_episode: 5
log_dir: "./logs/dcg/"
model_dir: "./models/dcg/"
agent: "DCG" # Options: DCG, DCG_S
env_name: "StarCraft2"
env_id: "2m_vs_1z"
env_seed: 1
fps: 15
learner: "DCG_Learner"
policy: "DCG_Policy"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
representation_hidden_size: [64, ]
q_hidden_size: [64, ] # the units for each hidden layer
hidden_utility_dim: 64 # hidden units of the utility function
hidden_payoff_dim: 64 # hidden units of the payoff function
bias_net: "Basic_MLP"
hidden_bias_dim: [64, ] # hidden units of the bias network with global states as input
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True
use_actions_mask: True
low_rank_payoff: False # low-rank approximation of payoff function
payoff_rank: 5 # the rank K in the paper
graph_type: "FULL" # specific type of the coordination graph
n_msg_iterations: 8 # number of iterations for message passing during belief propagation
msg_normalized: True # Message normalization during greedy action selection (Kok and Vlassis, 2006)
seed: 1
parallels: 8
buffer_size: 5000
batch_size: 32
learning_rate: 0.0007
gamma: 0.99 # discount factor
double_q: True # use double q learning
start_greedy: 1.0
end_greedy: 0.05
decay_step_greedy: 50000
start_training: 1000 # start training after n steps
running_steps: 1000000 # 1M
n_epochs: 8 # The number of training epochs after interaction.
sync_frequency: 200
use_grad_clip: False
grad_clip_norm: 0.5
eval_interval: 10000
test_episode: 16
log_dir: "./logs/dcg/"
model_dir: "./models/dcg/"
agent: "DCG" # Options: DCG, DCG_S
env_name: "StarCraft2"
env_id: "3m"
env_seed: 1
fps: 15
learner: "DCG_Learner"
policy: "DCG_Policy"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
representation_hidden_size: [64, ]
q_hidden_size: [64, ] # the units for each hidden layer
hidden_utility_dim: 64 # hidden units of the utility function
hidden_payoff_dim: 64 # hidden units of the payoff function
bias_net: "Basic_MLP"
hidden_bias_dim: [64, ] # hidden units of the bias network with global states as input
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True
use_actions_mask: True
low_rank_payoff: False # low-rank approximation of payoff function
payoff_rank: 5 # the rank K in the paper
graph_type: "FULL" # specific type of the coordination graph
n_msg_iterations: 8 # number of iterations for message passing during belief propagation
msg_normalized: True # Message normalization during greedy action selection (Kok and Vlassis, 2006)
seed: 1
parallels: 8
buffer_size: 5000
batch_size: 32
learning_rate: 0.0007
gamma: 0.99 # discount factor
double_q: True # use double q learning
start_greedy: 1.0
end_greedy: 0.05
decay_step_greedy: 50000
start_training: 1000 # start training after n steps
running_steps: 1000000 # 1M
n_epochs: 8 # The number of training epochs after interaction.
sync_frequency: 200
use_grad_clip: False
grad_clip_norm: 0.5
eval_interval: 10000
test_episode: 16
log_dir: "./logs/dcg/"
model_dir: "./models/dcg/"
agent: "DCG" # Options: DCG, DCG_S
env_name: "StarCraft2"
env_id: "8m"
env_seed: 1
fps: 15
learner: "DCG_Learner"
policy: "DCG_Policy"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
representation_hidden_size: [64, ]
q_hidden_size: [64, ] # the units for each hidden layer
hidden_utility_dim: 64 # hidden units of the utility function
hidden_payoff_dim: 64 # hidden units of the payoff function
bias_net: "Basic_MLP"
hidden_bias_dim: [64, ] # hidden units of the bias network with global states as input
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True
use_actions_mask: True
low_rank_payoff: False # low-rank approximation of payoff function
payoff_rank: 5 # the rank K in the paper
graph_type: "FULL" # specific type of the coordination graph
n_msg_iterations: 8 # number of iterations for message passing during belief propagation
msg_normalized: True # Message normalization during greedy action selection (Kok and Vlassis, 2006)
seed: 1
parallels: 8
buffer_size: 5000
batch_size: 32
learning_rate: 0.0007
gamma: 0.99 # discount factor
double_q: True # use double q learning
start_greedy: 1.0
end_greedy: 0.05
decay_step_greedy: 50000
start_training: 1000 # start training after n steps
running_steps: 1000000 # 1M
n_epochs: 8 # The number of training epochs after interaction.
sync_frequency: 200
use_grad_clip: False
grad_clip_norm: 0.5
eval_interval: 10000
test_episode: 16
log_dir: "./logs/dcg/"
model_dir: "./models/dcg/"
agent: "DCG" # Options: DCG, DCG_S
env_name: "StarCraft2"
env_id: "1c3s5z"
env_seed: 1
fps: 15
learner: "DCG_Learner"
policy: "DCG_Policy"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
representation_hidden_size: [64, ]
q_hidden_size: [64, ] # the units for each hidden layer
hidden_utility_dim: 64 # hidden units of the utility function
hidden_payoff_dim: 64 # hidden units of the payoff function
bias_net: "Basic_MLP"
hidden_bias_dim: [64, ] # hidden units of the bias network with global states as input
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True
use_actions_mask: True
low_rank_payoff: False # low-rank approximation of payoff function
payoff_rank: 5 # the rank K in the paper
graph_type: "FULL" # specific type of the coordination graph
n_msg_iterations: 8 # number of iterations for message passing during belief propagation
msg_normalized: True # Message normalization during greedy action selection (Kok and Vlassis, 2006)
seed: 1
parallels: 8
buffer_size: 5000
batch_size: 32
learning_rate: 0.0007
gamma: 0.99 # discount factor
double_q: True # use double q learning
start_greedy: 1.0
end_greedy: 0.05
decay_step_greedy: 50000
start_training: 1000 # start training after n steps
running_steps: 2000000 # 2M
n_epochs: 8 # The number of training epochs after interaction.
sync_frequency: 200
use_grad_clip: False
grad_clip_norm: 0.5
eval_interval: 20000
test_episode: 16
log_dir: "./logs/dcg/"
model_dir: "./models/dcg/"
agent: "DCG" # Options: DCG, DCG_S
env_name: "StarCraft2"
env_id: "2s3z"
env_seed: 1
fps: 15
learner: "DCG_Learner"
policy: "DCG_Policy"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
representation_hidden_size: [64, ]
q_hidden_size: [64, ] # the units for each hidden layer
hidden_utility_dim: 64 # hidden units of the utility function
hidden_payoff_dim: 64 # hidden units of the payoff function
bias_net: "Basic_MLP"
hidden_bias_dim: [64, ] # hidden units of the bias network with global states as input
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True
use_actions_mask: True
low_rank_payoff: False # low-rank approximation of payoff function
payoff_rank: 5 # the rank K in the paper
graph_type: "FULL" # specific type of the coordination graph
n_msg_iterations: 8 # number of iterations for message passing during belief propagation
msg_normalized: True # Message normalization during greedy action selection (Kok and Vlassis, 2006)
seed: 1
parallels: 8
buffer_size: 5000
batch_size: 32
learning_rate: 0.0007
gamma: 0.99 # discount factor
double_q: True # use double q learning
start_greedy: 1.0
end_greedy: 0.05
decay_step_greedy: 50000
start_training: 1000 # start training after n steps
running_steps: 2000000 # 2M
n_epochs: 8 # The number of training epochs after interaction.
sync_frequency: 200
use_grad_clip: False
grad_clip_norm: 0.5
eval_interval: 20000
test_episode: 16
log_dir: "./logs/dcg/"
model_dir: "./models/dcg/"
agent: "DCG" # Options: DCG, DCG_S
env_name: "StarCraft2"
env_id: "25m"
env_seed: 1
fps: 15
learner: "DCG_Learner"
policy: "DCG_Policy"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
representation_hidden_size: [64, ]
q_hidden_size: [64, ] # the units for each hidden layer
hidden_utility_dim: 64 # hidden units of the utility function
hidden_payoff_dim: 64 # hidden units of the payoff function
bias_net: "Basic_MLP"
hidden_bias_dim: [64, ] # hidden units of the bias network with global states as input
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True
use_actions_mask: True
low_rank_payoff: False # low-rank approximation of payoff function
payoff_rank: 5 # the rank K in the paper
graph_type: "FULL" # specific type of the coordination graph
n_msg_iterations: 8 # number of iterations for message passing during belief propagation
msg_normalized: True # Message normalization during greedy action selection (Kok and Vlassis, 2006)
seed: 1
parallels: 8
buffer_size: 5000
batch_size: 32
learning_rate: 0.0007
gamma: 0.99 # discount factor
double_q: True # use double q learning
start_greedy: 1.0
end_greedy: 0.05
decay_step_greedy: 50000
start_training: 1000 # start training after n steps
running_steps: 5000000 # 5M
n_epochs: 8 # The number of training epochs after interaction.
sync_frequency: 200
use_grad_clip: False
grad_clip_norm: 0.5
eval_interval: 50000
test_episode: 16
log_dir: "./logs/dcg/"
model_dir: "./models/dcg/"
agent: "DCG" # Options: DCG, DCG_S
env_name: "StarCraft2"
env_id: "5m_vs_6m"
env_seed: 1
fps: 15
learner: "DCG_Learner"
policy: "DCG_Policy"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
representation_hidden_size: [64, ]
q_hidden_size: [64, ] # the units for each hidden layer
hidden_utility_dim: 64 # hidden units of the utility function
hidden_payoff_dim: 64 # hidden units of the payoff function
bias_net: "Basic_MLP"
hidden_bias_dim: [64, ] # hidden units of the bias network with global states as input
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True
use_actions_mask: True
low_rank_payoff: False # low-rank approximation of payoff function
payoff_rank: 5 # the rank K in the paper
graph_type: "FULL" # specific type of the coordination graph
n_msg_iterations: 8 # number of iterations for message passing during belief propagation
msg_normalized: True # Message normalization during greedy action selection (Kok and Vlassis, 2006)
seed: 1
parallels: 8
buffer_size: 5000
batch_size: 32
learning_rate: 0.0007
gamma: 0.99 # discount factor
double_q: True # use double q learning
start_greedy: 1.0
end_greedy: 0.05
decay_step_greedy: 50000
start_training: 1000 # start training after n steps
running_steps: 10000000 # 10M
n_epochs: 8 # The number of training epochs after interaction.
sync_frequency: 200
use_grad_clip: False
grad_clip_norm: 0.5
eval_interval: 100000
test_episode: 16
log_dir: "./logs/dcg/"
model_dir: "./models/dcg/"
agent: "DCG" # Options: DCG, DCG_S
env_name: "StarCraft2"
env_id: "8m_vs_9m"
env_seed: 1
fps: 15
learner: "DCG_Learner"
policy: "DCG_Policy"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
representation_hidden_size: [64, ]
q_hidden_size: [64, ] # the units for each hidden layer
hidden_utility_dim: 64 # hidden units of the utility function
hidden_payoff_dim: 64 # hidden units of the payoff function
bias_net: "Basic_MLP"
hidden_bias_dim: [64, ] # hidden units of the bias network with global states as input
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True
use_actions_mask: True
low_rank_payoff: False # low-rank approximation of payoff function
payoff_rank: 5 # the rank K in the paper
graph_type: "FULL" # specific type of the coordination graph
n_msg_iterations: 8 # number of iterations for message passing during belief propagation
msg_normalized: True # Message normalization during greedy action selection (Kok and Vlassis, 2006)
seed: 1
parallels: 8
buffer_size: 5000
batch_size: 32
learning_rate: 0.0007
gamma: 0.99 # discount factor
double_q: True # use double q learning
start_greedy: 1.0
end_greedy: 0.05
decay_step_greedy: 50000
start_training: 1000 # start training after n steps
running_steps: 10000000 # 10M
n_epochs: 8 # The number of training epochs after interaction.
sync_frequency: 200
use_grad_clip: False
grad_clip_norm: 0.5
eval_interval: 100000
test_episode: 16
log_dir: "./logs/dcg/"
model_dir: "./models/dcg/"
agent: "DCG" # Options: DCG, DCG_S
env_name: "StarCraft2"
env_id: "MMM2"
env_seed: 1
fps: 15
learner: "DCG_Learner"
policy: "DCG_Policy"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
representation_hidden_size: [64, ]
q_hidden_size: [64, ] # the units for each hidden layer
hidden_utility_dim: 64 # hidden units of the utility function
hidden_payoff_dim: 64 # hidden units of the payoff function
bias_net: "Basic_MLP"
hidden_bias_dim: [64, ] # hidden units of the bias network with global states as input
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True
use_actions_mask: True
low_rank_payoff: False # low-rank approximation of payoff function
payoff_rank: 5 # the rank K in the paper
graph_type: "FULL" # specific type of the coordination graph
n_msg_iterations: 8 # number of iterations for message passing during belief propagation
msg_normalized: True # Message normalization during greedy action selection (Kok and Vlassis, 2006)
seed: 1
parallels: 8
buffer_size: 5000
batch_size: 32
learning_rate: 0.0007
gamma: 0.99 # discount factor
double_q: True # use double q learning
start_greedy: 1.0
end_greedy: 0.05
decay_step_greedy: 50000
start_training: 1000 # start training after n steps
running_steps: 10000000 # 10M
n_epochs: 8 # The number of training epochs after interaction.
sync_frequency: 200
use_grad_clip: False
grad_clip_norm: 0.5
eval_interval: 100000
test_episode: 16
log_dir: "./logs/dcg/"
model_dir: "./models/dcg/"
agent: "DCG" # Options: DCG, DCG_S
env_name: "StarCraft2"
env_id: "corridor"
env_seed: 1
fps: 15
learner: "DCG_Learner"
policy: "DCG_Policy"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
representation_hidden_size: [64, ]
q_hidden_size: [64, ] # the units for each hidden layer
hidden_utility_dim: 64 # hidden units of the utility function
hidden_payoff_dim: 64 # hidden units of the payoff function
bias_net: "Basic_MLP"
hidden_bias_dim: [64, ] # hidden units of the bias network with global states as input
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True
use_actions_mask: True
low_rank_payoff: False # low-rank approximation of payoff function
payoff_rank: 5 # the rank K in the paper
graph_type: "FULL" # specific type of the coordination graph
n_msg_iterations: 8 # number of iterations for message passing during belief propagation
msg_normalized: True # Message normalization during greedy action selection (Kok and Vlassis, 2006)
seed: 1
parallels: 8
buffer_size: 5000
batch_size: 32
learning_rate: 0.0007
gamma: 0.99 # discount factor
double_q: True # use double q learning
start_greedy: 1.0
end_greedy: 0.05
decay_step_greedy: 50000
start_training: 1000 # start training after n steps
running_steps: 10000000 # 10M
n_epochs: 8 # The number of training epochs after interaction.
sync_frequency: 200
use_grad_clip: False
grad_clip_norm: 0.5
eval_interval: 100000
test_episode: 16
log_dir: "./logs/dcg/"
model_dir: "./models/dcg/"
agent: "QTRAN_base" # Options: QTRAN_base, QTRAN_alt
env_name: "mpe" # Name of the environment.
env_id: "simple_spread_v3"
env_seed: 1
continuous_action: False # Continuous action space or not.
learner: "QTRAN_Learner"
policy: "Qtran_Mixing_Q_network"
representation: "Basic_MLP"
vectorize: "DummyVecMultiAgentEnv"
runner: "MARL" # Runner
use_rnn: False # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
representation_hidden_size: [64, ]
q_hidden_size: [64, ] # the units for each hidden layer
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: False
use_actions_mask: False
hidden_dim_mixing_net: 64 # hidden units of mixing network
hidden_dim_hyper_net: 64 # hidden units of hyper network
qtran_net_hidden_dim: 64
lambda_opt: 1.0
lambda_nopt: 1.0
seed: 1
parallels: 16
buffer_size: 1000000
batch_size: 32
learning_rate: 0.0005
gamma: 0.99 # discount factor
double_q: True # use double q learning
start_greedy: 1.0
end_greedy: 0.05
decay_step_greedy: 2500000
start_training: 1000 # start training after n steps
running_steps: 10000000 # 10M
training_frequency: 25
sync_frequency: 10000
use_grad_clip: False
grad_clip_norm: 0.5
eval_interval: 100000
test_episode: 5
log_dir: "./logs/qtran/"
model_dir: "./models/qtran/"
agent: "QTRAN_base" # Options: QTRAN_base, QTRAN_alt
env_name: "StarCraft2"
env_id: "2m_vs_1z"
env_seed: 1
fps: 15
learner: "QTRAN_Learner"
policy: "Qtran_Mixing_Q_network"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
representation_hidden_size: [64, ]
q_hidden_size: [64, ] # the units for each hidden layer
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True
use_actions_mask: True
hidden_dim_mixing_net: 64 # hidden units of mixing network
hidden_dim_hyper_net: 64 # hidden units of hyper network
qtran_net_hidden_dim: 64
lambda_opt: 1.0
lambda_nopt: 1.0
seed: 1
parallels: 8
buffer_size: 5000
batch_size: 32
learning_rate: 0.0007
gamma: 0.99 # discount factor
double_q: True # use double q learning
start_greedy: 1.0
end_greedy: 0.05
decay_step_greedy: 50000
start_training: 1000 # start training after n steps
running_steps: 1000000 # 1M
n_epochs: 8 # The number of training epochs after interaction.
sync_frequency: 200
use_grad_clip: False
grad_clip_norm: 0.5
eval_interval: 10000
test_episode: 16
log_dir: "./logs/qtran/"
model_dir: "./models/qtran/"
agent: "QTRAN_base" # Options: QTRAN_base, QTRAN_alt
env_name: "StarCraft2"
env_id: "3m"
env_seed: 1
fps: 15
learner: "QTRAN_Learner"
policy: "Qtran_Mixing_Q_network"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
representation_hidden_size: [64, ]
q_hidden_size: [64, ] # the units for each hidden layer
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True
use_actions_mask: True
hidden_dim_mixing_net: 64 # hidden units of mixing network
hidden_dim_hyper_net: 64 # hidden units of hyper network
qtran_net_hidden_dim: 64
lambda_opt: 1.0
lambda_nopt: 1.0
seed: 1
parallels: 8
buffer_size: 5000
batch_size: 32
learning_rate: 0.0007
gamma: 0.99 # discount factor
double_q: True # use double q learning
start_greedy: 1.0
end_greedy: 0.05
decay_step_greedy: 50000
start_training: 1000 # start training after n steps
running_steps: 1000000 # 1M
n_epochs: 8 # The number of training epochs after interaction.
sync_frequency: 200
use_grad_clip: False
grad_clip_norm: 0.5
eval_interval: 10000
test_episode: 16
log_dir: "./logs/qtran/"
model_dir: "./models/qtran/"
agent: "QTRAN_base" # Options: QTRAN_base, QTRAN_alt
env_name: "StarCraft2"
env_id: "8m"
env_seed: 1
fps: 15
learner: "QTRAN_Learner"
policy: "Qtran_Mixing_Q_network"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
representation_hidden_size: [64, ]
q_hidden_size: [64, ] # the units for each hidden layer
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True
use_actions_mask: True
hidden_dim_mixing_net: 64 # hidden units of mixing network
hidden_dim_hyper_net: 64 # hidden units of hyper network
qtran_net_hidden_dim: 64
lambda_opt: 1.0
lambda_nopt: 1.0
seed: 1
parallels: 8
buffer_size: 5000
batch_size: 32
learning_rate: 0.0007
gamma: 0.99 # discount factor
double_q: True # use double q learning
start_greedy: 1.0
end_greedy: 0.05
decay_step_greedy: 500000
start_training: 1000 # start training after n steps
running_steps: 1000000 # 1M
n_epochs: 8 # The number of training epochs after interaction.
sync_frequency: 200
use_grad_clip: False
grad_clip_norm: 0.5
eval_interval: 10000
test_episode: 16
log_dir: "./logs/qtran/"
model_dir: "./models/qtran/"
agent: "QTRAN_base" # Options: QTRAN_base, QTRAN_alt
env_name: "StarCraft2"
env_id: "1c3s5z"
env_seed: 1
fps: 15
learner: "QTRAN_Learner"
policy: "Qtran_Mixing_Q_network"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
representation_hidden_size: [64, ]
q_hidden_size: [64, ] # the units for each hidden layer
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True
use_actions_mask: True
hidden_dim_mixing_net: 64 # hidden units of mixing network
hidden_dim_hyper_net: 64 # hidden units of hyper network
qtran_net_hidden_dim: 64
lambda_opt: 1.0
lambda_nopt: 1.0
seed: 1
parallels: 8
buffer_size: 5000
batch_size: 32
learning_rate: 0.0007
gamma: 0.99 # discount factor
double_q: True # use double q learning
start_greedy: 1.0
end_greedy: 0.05
decay_step_greedy: 50000
start_training: 1000 # start training after n steps
running_steps: 2000000 # 2M
n_epochs: 8 # The number of training epochs after interaction.
sync_frequency: 200
use_grad_clip: False
grad_clip_norm: 0.5
eval_interval: 20000
test_episode: 16
log_dir: "./logs/qtran/"
model_dir: "./models/qtran/"
agent: "QTRAN_base" # Options: QTRAN_base, QTRAN_alt
env_name: "StarCraft2"
env_id: "2s3z"
env_seed: 1
fps: 15
learner: "QTRAN_Learner"
policy: "Qtran_Mixing_Q_network"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
representation_hidden_size: [64, ]
q_hidden_size: [64, ] # the units for each hidden layer
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True
use_actions_mask: True
hidden_dim_mixing_net: 64 # hidden units of mixing network
hidden_dim_hyper_net: 64 # hidden units of hyper network
qtran_net_hidden_dim: 64
lambda_opt: 1.0
lambda_nopt: 1.0
seed: 1
parallels: 8
buffer_size: 5000
batch_size: 32
learning_rate: 0.0007
gamma: 0.99 # discount factor
double_q: True # use double q learning
start_greedy: 1.0
end_greedy: 0.05
decay_step_greedy: 50000
start_training: 1000 # start training after n steps
running_steps: 2000000 # 2M
n_epochs: 8 # The number of training epochs after interaction.
sync_frequency: 200
use_grad_clip: False
grad_clip_norm: 0.5
eval_interval: 20000
test_episode: 16
log_dir: "./logs/qtran/"
model_dir: "./models/qtran/"
agent: "QTRAN_base" # Options: QTRAN_base, QTRAN_alt
env_name: "StarCraft2"
env_id: "25m"
env_seed: 1
fps: 15
learner: "QTRAN_Learner"
policy: "Qtran_Mixing_Q_network"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
representation_hidden_size: [64, ]
q_hidden_size: [64, ] # the units for each hidden layer
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True
use_actions_mask: True
hidden_dim_mixing_net: 64 # hidden units of mixing network
hidden_dim_hyper_net: 64 # hidden units of hyper network
qtran_net_hidden_dim: 64
lambda_opt: 1.0
lambda_nopt: 1.0
seed: 1
parallels: 8
buffer_size: 5000
batch_size: 32
learning_rate: 0.0007
gamma: 0.99 # discount factor
double_q: True # use double q learning
start_greedy: 1.0
end_greedy: 0.05
decay_step_greedy: 1000000
start_training: 1000 # start training after n steps
running_steps: 5000000 # 5M
n_epochs: 8 # The number of training epochs after interaction.
sync_frequency: 200
use_grad_clip: False
grad_clip_norm: 0.5
eval_interval: 50000
test_episode: 16
log_dir: "./logs/qtran/"
model_dir: "./models/qtran/"
agent: "QTRAN_base" # Options: QTRAN_base, QTRAN_alt
env_name: "StarCraft2"
env_id: "5m_vs_6m"
env_seed: 1
fps: 15
learner: "QTRAN_Learner"
policy: "Qtran_Mixing_Q_network"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
representation_hidden_size: [64, ]
q_hidden_size: [64, ] # the units for each hidden layer
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True
use_actions_mask: True
hidden_dim_mixing_net: 64 # hidden units of mixing network
hidden_dim_hyper_net: 64 # hidden units of hyper network
qtran_net_hidden_dim: 64
lambda_opt: 1.0
lambda_nopt: 1.0
seed: 1
parallels: 8
buffer_size: 5000
batch_size: 32
learning_rate: 0.0007
gamma: 0.99 # discount factor
double_q: True # use double q learning
start_greedy: 1.0
end_greedy: 0.05
decay_step_greedy: 5000000
start_training: 1000 # start training after n steps
running_steps: 10000000 # 10M
n_epochs: 8 # The number of training epochs after interaction.
sync_frequency: 200
use_grad_clip: False
grad_clip_norm: 0.5
eval_interval: 100000
test_episode: 16
log_dir: "./logs/qtran/"
model_dir: "./models/qtran/"
agent: "QTRAN_base" # Options: QTRAN_base, QTRAN_alt
env_name: "StarCraft2"
env_id: "8m_vs_9m"
env_seed: 1
fps: 15
learner: "QTRAN_Learner"
policy: "Qtran_Mixing_Q_network"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
representation_hidden_size: [64, ]
q_hidden_size: [64, ] # the units for each hidden layer
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True
use_actions_mask: True
hidden_dim_mixing_net: 64 # hidden units of mixing network
hidden_dim_hyper_net: 64 # hidden units of hyper network
qtran_net_hidden_dim: 64
lambda_opt: 1.0
lambda_nopt: 1.0
seed: 1
parallels: 8
buffer_size: 5000
batch_size: 32
learning_rate: 0.0007
gamma: 0.99 # discount factor
double_q: True # use double q learning
start_greedy: 1.0
end_greedy: 0.05
decay_step_greedy: 5000000
start_training: 1000 # start training after n steps
running_steps: 10000000 # 10M
n_epochs: 8 # The number of training epochs after interaction.
sync_frequency: 200
use_grad_clip: False
grad_clip_norm: 0.5
eval_interval: 100000
test_episode: 16
log_dir: "./logs/qtran/"
model_dir: "./models/qtran/"
agent: "QTRAN_base" # Options: QTRAN_base, QTRAN_alt
env_name: "StarCraft2"
env_id: "MMM2"
env_seed: 1
fps: 15
learner: "QTRAN_Learner"
policy: "Qtran_Mixing_Q_network"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
representation_hidden_size: [64, ]
q_hidden_size: [64, ] # the units for each hidden layer
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True
use_actions_mask: True
hidden_dim_mixing_net: 64 # hidden units of mixing network
hidden_dim_hyper_net: 64 # hidden units of hyper network
qtran_net_hidden_dim: 64
lambda_opt: 1.0
lambda_nopt: 1.0
seed: 1
parallels: 8
buffer_size: 5000
batch_size: 32
learning_rate: 0.0007
gamma: 0.99 # discount factor
double_q: True # use double q learning
start_greedy: 1.0
end_greedy: 0.05
decay_step_greedy: 1000000
start_training: 1000 # start training after n steps
running_steps: 10000000 # 10M
n_epochs: 8 # The number of training epochs after interaction.
sync_frequency: 200
use_grad_clip: False
grad_clip_norm: 0.5
eval_interval: 100000
test_episode: 16
log_dir: "./logs/qtran/"
model_dir: "./models/qtran/"
agent: "QTRAN_base" # Options: QTRAN_base, QTRAN_alt
env_name: "StarCraft2"
env_id: "corridor"
env_seed: 1
fps: 15
learner: "QTRAN_Learner"
policy: "Qtran_Mixing_Q_network"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
representation_hidden_size: [64, ]
q_hidden_size: [64, ] # the units for each hidden layer
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True
use_actions_mask: True
hidden_dim_mixing_net: 64 # hidden units of mixing network
hidden_dim_hyper_net: 64 # hidden units of hyper network
qtran_net_hidden_dim: 64
lambda_opt: 1.0
lambda_nopt: 1.0
seed: 1
parallels: 8
buffer_size: 5000
batch_size: 32
learning_rate: 0.0007
gamma: 0.99 # discount factor
double_q: True # use double q learning
start_greedy: 1.0
end_greedy: 0.05
decay_step_greedy: 1000000
start_training: 1000 # start training after n steps
running_steps: 10000000 # 10M
n_epochs: 8 # The number of training epochs after interaction.
sync_frequency: 200
use_grad_clip: False
grad_clip_norm: 0.5
eval_interval: 100000
test_episode: 16
log_dir: "./logs/qtran/"
model_dir: "./models/qtran/"
agent: "IAC"
env_name: "mpe" # Name of the environment.
env_id: "simple_spread_v3"
env_seed: 1
continuous_action: False # Continuous action space or not.
learner: "IAC_Learner"
policy: "Categorical_MAAC_Policy"
representation: "Basic_MLP"
vectorize: "SubprocVecMultiAgentEnv"
runner: "MARL" # Runner
# recurrent settings for Basic_RNN representation
use_rnn: False # Whether to use recurrent neural networks.
rnn: "GRU" # The type of recurrent layer.
fc_hidden_sizes: [64, 64, 64] # The hidden size of feed forward layer in RNN representation.
recurrent_hidden_size: 64 # The hidden size of the recurrent layer.
N_recurrent_layers: 1 # The number of recurrent layer.
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
normalize: "LayerNorm" # Layer normalization.
initialize: "orthogonal" # Network initializer.
gain: 0.01 # Gain value for network initialization.
representation_hidden_size: [64, ] # A list of hidden units for each layer of Basic_MLP representation networks.
actor_hidden_size: [64, ] # A list of hidden units for each layer of actor network.
critic_hidden_size: [64, ] # A list of hidden units for each layer of critic network.
activation: "relu" # The activation function of each hidden layer.
activation_action: "sigmoid" # The activation function for the last layer of the actor.
use_parameter_sharing: True # If to use parameter sharing for all agents' policies.
use_actions_mask: False # If to use actions mask for unavailable actions.
seed: 1 # Random seed.
parallels: 16 # The number of environments to run in parallel.
buffer_size: 32 # Number of the transitions (use_rnn is False), or the episodes (use_rnn is True) in replay buffer.
n_epochs: 1 # Number of epochs to train.
n_minibatch: 1 # Number of minibatch to sample and train. batch_size = buffer_size // n_minibatch.
learning_rate: 0.0005 # Learning rate.
weight_decay: 0 # The steps to decay the greedy epsilon.
vf_coef: 0.1 # Coefficient factor for critic loss.
ent_coef: 0.01 # Coefficient factor for entropy loss.
gamma: 0.99 # Discount factor.
# tricks
use_linear_lr_decay: False # If to use linear learning rate decay.
end_factor_lr_decay: 0.5 # The end factor for learning rate scheduler.
use_global_state: True # If to use global state to replace merged observations.
use_value_clip: False # Limit the value range.
value_clip_range: 0.2 # The value clip range.
use_value_norm: False # Use running mean and std to normalize rewards.
use_huber_loss: False # True: use huber loss; False: use MSE loss.
huber_delta: 10.0 # The threshold at which to change between delta-scaled L1 and L2 loss. (For huber loss).
use_advnorm: False # If to use advantage normalization.
use_gae: True # Use GAE trick.
gae_lambda: 0.8 # The GAE lambda.
use_grad_clip: True # Gradient normalization.
grad_clip_norm: 10.0 # The max norm of the gradient.
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm().
running_steps: 10000000 # The total running steps.
eval_interval: 100000 # The interval between every two trainings.
test_episode: 5 # The episodes to test in each test period.
log_dir: "./logs/iac/"
model_dir: "./models/iac/"
agent: "IAC"
env_name: "StarCraft2"
env_id: "2m_vs_1z"
env_seed: 1
fps: 15
learner: "IAC_Learner"
policy: "Categorical_MAAC_Policy_Share"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
normalize: "LayerNorm"
initialize: "orthogonal"
gain: 0.01
actor_hidden_size: []
critic_hidden_size: []
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True # If to use parameter sharing for all agents' policies.
use_actions_mask: True # If to use actions mask for unavailable actions.
seed: 1
parallels: 8
buffer_size: 8
n_epochs: 1
n_minibatch: 1
learning_rate: 0.0007 # 7e-4
weight_decay: 0
vf_coef: 1.0
ent_coef: 0.01
gamma: 0.99 # discount factor
# tricks
use_linear_lr_decay: False # if use linear learning rate decay
end_factor_lr_decay: 0.5
use_global_state: False # if use global state to replace joint observations
use_value_clip: True # limit the value range
value_clip_range: 0.2
use_value_norm: True # use running mean and std to normalize rewards.
use_huber_loss: True # True: use huber loss; False: use MSE loss.
huber_delta: 10.0
use_advnorm: True # use advantage normalization.
use_gae: True # use GAE trick to calculate returns.
gae_lambda: 0.95
use_grad_clip: True # gradient normalization
grad_clip_norm: 10.0
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
running_steps: 1000000
eval_interval: 10000
test_episode: 16
log_dir: "./logs/iac/"
model_dir: "./models/iac/"
agent: "IAC"
env_name: "StarCraft2"
env_id: "3m"
env_seed: 1
fps: 15
learner: "IAC_Learner"
policy: "Categorical_MAAC_Policy_Share"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64,]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
normalize: "LayerNorm"
initialize: "orthogonal"
gain: 0.01
actor_hidden_size: []
critic_hidden_size: []
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True # If to use parameter sharing for all agents' policies.
use_actions_mask: True # If to use actions mask for unavailable actions.
seed: 1
parallels: 8
buffer_size: 8
n_epochs: 1
n_minibatch: 1
learning_rate: 0.0007 # 7e-4
weight_decay: 0
vf_coef: 1.0
ent_coef: 0.0
gamma: 0.99 # discount factor
# tricks
use_linear_lr_decay: False # if use linear learning rate decay
end_factor_lr_decay: 0.5
use_global_state: False # if use global state to replace joint observations
use_value_clip: True # limit the value range
value_clip_range: 0.2
use_value_norm: False # use running mean and std to normalize rewards.
use_huber_loss: True # True: use huber loss; False: use MSE loss.
huber_delta: 10.0
use_advnorm: False # use advantage normalization.
use_gae: True # use GAE trick to calculate returns.
gae_lambda: 0.95
use_grad_clip: True # gradient normalization
grad_clip_norm: 10.0
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
running_steps: 1000000
eval_interval: 10000
test_episode: 16
log_dir: "./logs/iac/"
model_dir: "./models/iac/"
agent: "IAC"
env_name: "StarCraft2"
env_id: "8m"
env_seed: 1
fps: 15
learner: "IAC_Learner"
policy: "Categorical_MAAC_Policy_Share"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
on_policy: True
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
normalize: "LayerNorm"
initialize: "orthogonal"
gain: 0.01
actor_hidden_size: []
critic_hidden_size: []
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True # If to use parameter sharing for all agents' policies.
use_actions_mask: True # If to use actions mask for unavailable actions.
seed: 1
parallels: 8
buffer_size: 8
n_epochs: 1
n_minibatch: 1
learning_rate: 0.0007 # 7e-4
weight_decay: 0
vf_coef: 1.0
ent_coef: 0.01
gamma: 0.99 # discount factor
# tricks
use_linear_lr_decay: False # if use linear learning rate decay
end_factor_lr_decay: 0.5
use_global_state: False # if use global state to replace joint observations
use_value_clip: True # limit the value range
value_clip_range: 0.2
use_value_norm: True # use running mean and std to normalize rewards.
use_huber_loss: True # True: use huber loss; False: use MSE loss.
huber_delta: 10.0
use_advnorm: True # use advantage normalization.
use_gae: True # use GAE trick to calculate returns.
gae_lambda: 0.95
use_grad_clip: True # gradient normalization
grad_clip_norm: 10.0
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
running_steps: 1000000 # 1M
eval_interval: 5000
test_episode: 16
log_dir: "./logs/iac/"
model_dir: "./models/iac/"
agent: "IAC"
env_name: "StarCraft2"
env_id: "1c3s5z"
env_seed: 1
fps: 15
learner: "IAC_Learner"
policy: "Categorical_MAAC_Policy_Share"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
normalize: "LayerNorm"
initialize: "orthogonal"
gain: 0.01
actor_hidden_size: []
critic_hidden_size: []
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True # If to use parameter sharing for all agents' policies.
use_actions_mask: True # If to use actions mask for unavailable actions.
seed: 1
parallels: 8
buffer_size: 8
n_epochs: 1
n_minibatch: 1
learning_rate: 0.0007 # 7e-4
weight_decay: 0
vf_coef: 1.0
ent_coef: 0.01
gamma: 0.99 # discount factor
# tricks
use_linear_lr_decay: False # if use linear learning rate decay
end_factor_lr_decay: 0.5
use_global_state: False # if use global state to replace joint observations
use_value_clip: True # limit the value range
value_clip_range: 0.2
use_value_norm: True # use running mean and std to normalize rewards.
use_huber_loss: True # True: use huber loss; False: use MSE loss.
huber_delta: 10.0
use_advnorm: True # use advantage normalization.
use_gae: True # use GAE trick to calculate returns.
gae_lambda: 0.95
use_grad_clip: True # gradient normalization
grad_clip_norm: 10.0
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
running_steps: 2000000
eval_interval: 20000
test_episode: 16
log_dir: "./logs/iac/"
model_dir: "./models/iac/"
agent: "IAC"
env_name: "StarCraft2"
env_id: "2s3z"
env_seed: 1
fps: 15
learner: "IAC_Learner"
policy: "Categorical_MAAC_Policy_Share"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
normalize: "LayerNorm"
initialize: "orthogonal"
gain: 0.01
actor_hidden_size: []
critic_hidden_size: []
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True # If to use parameter sharing for all agents' policies.
use_actions_mask: True # If to use actions mask for unavailable actions.
seed: 1
parallels: 8
buffer_size: 8
n_epochs: 1
n_minibatch: 1
learning_rate: 0.0007 # 7e-4
weight_decay: 0
vf_coef: 1.0
ent_coef: 0.01
gamma: 0.99 # discount factor
# tricks
use_linear_lr_decay: False # if use linear learning rate decay
end_factor_lr_decay: 0.5
use_global_state: False # if use global state to replace joint observations
use_value_clip: True # limit the value range
value_clip_range: 0.2
use_value_norm: True # use running mean and std to normalize rewards.
use_huber_loss: True # True: use huber loss; False: use MSE loss.
huber_delta: 10.0
use_advnorm: True # use advantage normalization.
use_gae: True # use GAE trick to calculate returns.
gae_lambda: 0.95
use_grad_clip: True # gradient normalization
grad_clip_norm: 10.0
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
running_steps: 2000000
eval_interval: 20000
test_episode: 16
log_dir: "./logs/iac/"
model_dir: "./models/iac/"
agent: "IAC"
env_name: "StarCraft2"
env_id: "25m"
env_seed: 1
fps: 15
learner: "IAC_Learner"
policy: "Categorical_MAAC_Policy_Share"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
on_policy: True
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
normalize: "LayerNorm"
initialize: "orthogonal"
gain: 0.01
actor_hidden_size: []
critic_hidden_size: []
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True # If to use parameter sharing for all agents' policies.
use_actions_mask: True # If to use actions mask for unavailable actions.
seed: 1
parallels: 8
buffer_size: 8
n_epochs: 1
n_minibatch: 1
learning_rate: 0.0007 # 7e-4
weight_decay: 0
vf_coef: 1.0
ent_coef: 0.01
gamma: 0.99 # discount factor
# tricks
use_linear_lr_decay: False # if use linear learning rate decay
end_factor_lr_decay: 0.5
use_global_state: False # if use global state to replace joint observations
use_value_clip: True # limit the value range
value_clip_range: 0.2
use_value_norm: True # use running mean and std to normalize rewards.
use_huber_loss: True # True: use huber loss; False: use MSE loss.
huber_delta: 10.0
use_advnorm: True # use advantage normalization.
use_gae: True # use GAE trick to calculate returns.
gae_lambda: 0.95
use_grad_clip: True # gradient normalization
grad_clip_norm: 10.0
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
running_steps: 5000000 # 5M
eval_interval: 25000
test_episode: 16
log_dir: "./logs/iac/"
model_dir: "./models/iac/"
agent: "IAC"
env_name: "StarCraft2"
env_id: "5m_vs_6m"
env_seed: 1
fps: 15
learner: "IAC_Learner"
policy: "Categorical_MAAC_Policy_Share"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
on_policy: True
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
normalize: "LayerNorm"
initialize: "orthogonal"
gain: 0.01
actor_hidden_size: []
critic_hidden_size: []
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True # If to use parameter sharing for all agents' policies.
use_actions_mask: True # If to use actions mask for unavailable actions.
seed: 1
parallels: 8
buffer_size: 8
n_epochs: 1
n_minibatch: 1
learning_rate: 0.0007 # 7e-4
weight_decay: 0
vf_coef: 1.0
ent_coef: 0.01
gamma: 0.99 # discount factor
# tricks
use_linear_lr_decay: False # if use linear learning rate decay
end_factor_lr_decay: 0.5
use_global_state: False # if use global state to replace joint observations
use_value_clip: True # limit the value range
value_clip_range: 0.05
use_value_norm: True # use running mean and std to normalize rewards.
use_huber_loss: True # True: use huber loss; False: use MSE loss.
huber_delta: 10.0
use_advnorm: True # use advantage normalization.
use_gae: True # use GAE trick to calculate returns.
gae_lambda: 0.95
use_grad_clip: True # gradient normalization
grad_clip_norm: 10.0
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
running_steps: 10000000 # 10M
eval_interval: 50000
test_episode: 16
log_dir: "./logs/iac/"
model_dir: "./models/iac/"
agent: "IAC"
env_name: "StarCraft2"
env_id: "8m_vs_9m"
env_seed: 1
fps: 15
learner: "IAC_Learner"
policy: "Categorical_MAAC_Policy_Share"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
on_policy: True
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
normalize: "LayerNorm"
initialize: "orthogonal"
gain: 0.01
actor_hidden_size: []
critic_hidden_size: []
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True # If to use parameter sharing for all agents' policies.
use_actions_mask: True # If to use actions mask for unavailable actions.
seed: 1
parallels: 8
buffer_size: 8
n_epochs: 1
n_minibatch: 1
learning_rate: 0.0007 # 7e-4
weight_decay: 0
vf_coef: 1.0
ent_coef: 0.01
gamma: 0.99 # discount factor
# tricks
use_linear_lr_decay: False # if use linear learning rate decay
end_factor_lr_decay: 0.5
use_global_state: False # if use global state to replace joint observations
use_grad_clip: True # gradient normalization
grad_clip_norm: 10.0
use_value_clip: True # limit the value range
value_clip_range: 0.05
use_value_norm: True # use running mean and std to normalize rewards.
use_huber_loss: True # True: use huber loss; False: use MSE loss.
huber_delta: 10.0
use_advnorm: True # use advantage normalization.
use_gae: True # use GAE trick to calculate returns.
gae_lambda: 0.95
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
running_steps: 10000000 # 10M
eval_interval: 50000
test_episode: 16
log_dir: "./logs/iac/"
model_dir: "./models/iac/"
agent: "IAC"
env_name: "StarCraft2"
env_id: "MMM2"
env_seed: 1
fps: 15
learner: "IAC_Learner"
policy: "Categorical_MAAC_Policy_Share"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
on_policy: True
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
normalize: "LayerNorm"
initialize: "orthogonal"
gain: 1.0
actor_hidden_size: []
critic_hidden_size: []
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True # If to use parameter sharing for all agents' policies.
use_actions_mask: True # If to use actions mask for unavailable actions.
seed: 1
parallels: 8
buffer_size: 8
n_epochs: 1
n_minibatch: 2
learning_rate: 0.0007 # 7e-4
weight_decay: 0
vf_coef: 1.0
ent_coef: 0.01
gamma: 0.99 # discount factor
# tricks
use_linear_lr_decay: False # if use linear learning rate decay
end_factor_lr_decay: 0.5
use_global_state: False # if use global state to replace joint observations
use_value_clip: True # limit the value range
value_clip_range: 0.2
use_value_norm: True # use running mean and std to normalize rewards.
use_huber_loss: True # True: use huber loss; False: use MSE loss.
huber_delta: 10.0
use_advnorm: True # use advantage normalization.
use_gae: True # use GAE trick to calculate returns.
gae_lambda: 0.95
use_grad_clip: True # gradient normalization
grad_clip_norm: 10.0
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
running_steps: 10000000 # 10M
eval_interval: 50000
test_episode: 16
log_dir: "./logs/iac/"
model_dir: "./models/iac/"
agent: "IAC"
env_name: "StarCraft2"
env_id: "corridor"
env_seed: 1
fps: 15
learner: "IAC_Learner"
policy: "Categorical_MAAC_Policy_Share"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
on_policy: True
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
normalize: "LayerNorm"
initialize: "orthogonal"
gain: 0.01
actor_hidden_size: []
critic_hidden_size: []
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True # If to use parameter sharing for all agents' policies.
use_actions_mask: True # If to use actions mask for unavailable actions.
seed: 1
parallels: 8
buffer_size: 8
n_epochs: 1
n_minibatch: 1
learning_rate: 0.0007 # 7e-4
weight_decay: 0
vf_coef: 1.0
ent_coef: 0.01
gamma: 0.99 # discount factor
# tricks
use_linear_lr_decay: False # if use linear learning rate decay
end_factor_lr_decay: 0.5
use_global_state: False # if use global state to replace joint observations
use_value_clip: True # limit the value range
value_clip_range: 0.2
use_value_norm: True # use running mean and std to normalize rewards.
use_huber_loss: True # True: use huber loss; False: use MSE loss.
huber_delta: 10.0
use_advnorm: True # use advantage normalization.
use_gae: True # use GAE trick to calculate returns.
gae_lambda: 0.95
use_grad_clip: True # gradient normalization
grad_clip_norm: 10.0
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
running_steps: 10000000 # 10M
eval_interval: 50000
test_episode: 16
log_dir: "./logs/iac/"
model_dir: "./models/iac/"
agent: "COMA" # the learning algorithms
env_name: "mpe" # Environment name.
env_id: "simple_spread_v3" # Environment map.
env_seed: 1 # The random seed of the environment.
continuous_action: False # Continuous action space or not.
learner: "COMA_Learner"
policy: "Categorical_COMA_Policy" # Name of policy.
representation: "Basic_MLP" # Name of representation.
representation_critic: "Basic_MLP" # Name of representation for critic.
vectorize: "SubprocVecMultiAgentEnv" # Method to vectorize the environment.
runner: "MARL" # Runner.
# recurrent settings for Basic_RNN representation
use_rnn: False # If to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
fc_hidden_sizes: [64, ] # The fully connected layer for Basic_RNN representation.
recurrent_hidden_size: 64 # The size of hidden layers of recurrent networks.
N_recurrent_layers: 1 # Number of recurrent layers.
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
normalize: "LayerNorm" # Layer normalization.
initialize: "orthogonal" # Network initializer.
gain: 0.01 # Gain value for network initialization.
representation_hidden_size: [64, ] # A list of hidden units for each layer of Basic_MLP representation networks.
actor_hidden_size: [128, ] # A list of hidden units for each layer of actor network.
critic_hidden_size: [128, ] # A list of hidden units for each layer of critic network.
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True # If to use parameter sharing for all agents' policies.
use_actions_mask: False # If to use actions mask for unavailable actions.
seed: 1 # Random seeds.
parallels: 16 # Number of environments that to be implemented in parallel.
buffer_size: 32 # Total buffer size.
n_epochs: 1 # Number of epochs to update the model.
n_minibatch: 1 # Number of minibatch.
learning_rate_actor: 0.0007 # Learning rate of actor.
learning_rate_critic: 0.0007 # Learning rate of critic.
weight_decay: 0 # The steps to decay the greedy epsilon.
start_greedy: 0.5 # The start value of greedy epsilon.
end_greedy: 0.01 # The end value of greedy epsilon.
decay_step_greedy: 2500000 # The steps to decay the greedy epsilon.
sync_frequency: 200 # The frequency to synchronize target networks.
vf_coef: 0.1 # Coefficient factor for critic loss.
ent_coef: 0.01 # Coefficient factor for entropy loss.
gamma: 0.99 # Discount factor.
# tricks
use_linear_lr_decay: False # If to use linear learning rate decay.
end_factor_lr_decay: 0.5 # The end factor for learning rate scheduler.
use_advnorm: True # If to use advantage normalization.
use_gae: True # Use GAE trick.
gae_lambda: 0.8 # The GAE lambda.
use_grad_clip: True # Gradient normalization.
grad_clip_norm: 10.0 # The max norm of the gradient.
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm().
running_steps: 10000000 # The total running steps.
eval_interval: 100000 # The interval between every two trainings.
test_episode: 5 # The episodes to test in each test period.
log_dir: "./logs/coma/"
model_dir: "./models/coma/"
agent: "COMA" # the learning algorithms_marl
env_name: "StarCraft2"
env_id: "2m_vs_1z"
env_seed: 1
fps: 15
learner: "COMA_Learner"
policy: "Categorical_COMA_Policy"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
normalize: "LayerNorm"
initialize: "orthogonal"
gain: 0.01
actor_hidden_size: [64, ]
critic_hidden_size: [128, 128]
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True # If to use parameter sharing for all agents' policies.
use_actions_mask: False # If to use actions mask for unavailable actions.
seed: 1
parallels: 8
buffer_size: 8
n_epochs: 1
n_minibatch: 1
learning_rate_actor: 0.0007 # Learning rate of actor.
learning_rate_critic: 0.0007 # Learning rate of critic.
weight_decay: 0 # The steps to decay the greedy epsilon.
start_greedy: 0.5 # The start value of greedy epsilon.
end_greedy: 0.01 # The end value of greedy epsilon.
decay_step_greedy: 2500000 # The steps to decay the greedy epsilon.
sync_frequency: 200 # The frequency to synchronize target networks.
vf_coef: 0.1 # Coefficient factor for critic loss.
ent_coef: 0.01 # Coefficient factor for entropy loss.
gamma: 0.95 # Discount factor.
# tricks
use_linear_lr_decay: False # If to use linear learning rate decay.
end_factor_lr_decay: 0.5 # The end factor for learning rate scheduler.
use_advnorm: False
use_gae: True # Use GAE trick.
gae_lambda: 0.8 # The GAE lambda.
use_grad_clip: True # Gradient normalization.
grad_clip_norm: 10.0 # The max norm of the gradient.
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm().
running_steps: 1000000
eval_interval: 10000
test_episode: 16
log_dir: "./logs/coma/"
model_dir: "./models/coma/"
agent: "COMA" # the learning algorithms_marl
env_name: "StarCraft2"
env_id: "3m"
env_seed: 1
fps: 15
learner: "COMA_Learner"
policy: "Categorical_COMA_Policy"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
normalize: "LayerNorm"
initialize: "orthogonal"
gain: 0.01
actor_hidden_size: [64, ]
critic_hidden_size: [128, 128]
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True # If to use parameter sharing for all agents' policies.
use_actions_mask: False # If to use actions mask for unavailable actions.
seed: 1
parallels: 8
buffer_size: 8
n_epochs: 1
n_minibatch: 1
learning_rate_actor: 0.0007 # Learning rate of actor.
learning_rate_critic: 0.0007 # Learning rate of critic.
weight_decay: 0 # The steps to decay the greedy epsilon.
start_greedy: 0.5
end_greedy: 0.01
decay_step_greedy: 100000
sync_frequency: 200
vf_coef: 0.1 # Coefficient factor for critic loss.
ent_coef: 0.01 # Coefficient factor for entropy loss.
gamma: 0.95 # Discount factor.
# tricks
use_linear_lr_decay: False # If to use linear learning rate decay.
end_factor_lr_decay: 0.5 # The end factor for learning rate scheduler.
use_advnorm: False
use_gae: True # Use GAE trick.
gae_lambda: 0.8 # The GAE lambda.
use_grad_clip: True # Gradient normalization.
grad_clip_norm: 10.0 # The max norm of the gradient.
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm().
running_steps: 1000000
eval_interval: 10000
test_episode: 16
log_dir: "./logs/coma/"
model_dir: "./models/coma/"
agent: "COMA" # the learning algorithms_marl
env_name: "StarCraft2"
env_id: "8m"
env_seed: 1
fps: 15
learner: "COMA_Learner"
policy: "Categorical_COMA_Policy"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
normalize: "LayerNorm"
initialize: "orthogonal"
gain: 0.01
actor_hidden_size: [64, ]
critic_hidden_size: [128, 128]
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True # If to use parameter sharing for all agents' policies.
use_actions_mask: False # If to use actions mask for unavailable actions.
seed: 1
parallels: 8
buffer_size: 8
n_epochs: 1
n_minibatch: 1
learning_rate_actor: 0.0007 # Learning rate of actor.
learning_rate_critic: 0.0007 # Learning rate of critic.
weight_decay: 0 # The steps to decay the greedy epsilon.
start_greedy: 0.5 # The start value of greedy epsilon.
end_greedy: 0.01 # The end value of greedy epsilon.
decay_step_greedy: 2500000 # The steps to decay the greedy epsilon.
sync_frequency: 200 # The frequency to synchronize target networks.
vf_coef: 0.1 # Coefficient factor for critic loss.
ent_coef: 0.01 # Coefficient factor for entropy loss.
gamma: 0.95 # Discount factor.
# tricks
use_linear_lr_decay: False # If to use linear learning rate decay.
end_factor_lr_decay: 0.5 # The end factor for learning rate scheduler.
use_advnorm: False
use_gae: True # Use GAE trick.
gae_lambda: 0.8 # The GAE lambda.
use_grad_clip: True # Gradient normalization.
grad_clip_norm: 10.0 # The max norm of the gradient.
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm().
running_steps: 1000000
eval_interval: 10000
test_episode: 16
log_dir: "./logs/coma/"
model_dir: "./models/coma/"
agent: "COMA" # the learning algorithms_marl
env_name: "StarCraft2"
env_id: "1c3s5z"
env_seed: 1
fps: 15
learner: "COMA_Learner"
policy: "Categorical_COMA_Policy"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
normalize: "LayerNorm"
initialize: "orthogonal"
gain: 0.01
actor_hidden_size: [64, ]
critic_hidden_size: [128, 128]
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True # If to use parameter sharing for all agents' policies.
use_actions_mask: False # If to use actions mask for unavailable actions.
seed: 1
parallels: 8
buffer_size: 8
n_epochs: 1
n_minibatch: 1
learning_rate_actor: 0.0007 # Learning rate of actor.
learning_rate_critic: 0.0007 # Learning rate of critic.
weight_decay: 0 # The steps to decay the greedy epsilon.
start_greedy: 0.5 # The start value of greedy epsilon.
end_greedy: 0.01 # The end value of greedy epsilon.
decay_step_greedy: 2500000 # The steps to decay the greedy epsilon.
sync_frequency: 200 # The frequency to synchronize target networks.
vf_coef: 0.1 # Coefficient factor for critic loss.
ent_coef: 0.01 # Coefficient factor for entropy loss.
gamma: 0.95 # Discount factor.
# tricks
use_linear_lr_decay: False # If to use linear learning rate decay.
end_factor_lr_decay: 0.5 # The end factor for learning rate scheduler.
use_advnorm: False
use_gae: True # Use GAE trick.
gae_lambda: 0.8 # The GAE lambda.
use_grad_clip: True # Gradient normalization.
grad_clip_norm: 10.0 # The max norm of the gradient.
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm().
running_steps: 2000000
eval_interval: 20000
test_episode: 16
log_dir: "./logs/coma/"
model_dir: "./models/coma/"
agent: "COMA" # the learning algorithms_marl
env_name: "StarCraft2"
env_id: "2s3z"
env_seed: 1
fps: 15
learner: "COMA_Learner"
policy: "Categorical_COMA_Policy"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
normalize: "LayerNorm"
initialize: "orthogonal"
gain: 0.01
actor_hidden_size: [64, ]
critic_hidden_size: [128, 128]
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True # If to use parameter sharing for all agents' policies.
use_actions_mask: False # If to use actions mask for unavailable actions.
seed: 1
parallels: 8
buffer_size: 8
n_epochs: 1
n_minibatch: 1
learning_rate_actor: 0.0007 # Learning rate of actor.
learning_rate_critic: 0.0007 # Learning rate of critic.
weight_decay: 0 # The steps to decay the greedy epsilon.
start_greedy: 0.5 # The start value of greedy epsilon.
end_greedy: 0.01 # The end value of greedy epsilon.
decay_step_greedy: 2500000 # The steps to decay the greedy epsilon.
sync_frequency: 200 # The frequency to synchronize target networks.
vf_coef: 0.1 # Coefficient factor for critic loss.
ent_coef: 0.01 # Coefficient factor for entropy loss.
gamma: 0.95 # Discount factor.
# tricks
use_linear_lr_decay: False # If to use linear learning rate decay.
end_factor_lr_decay: 0.5 # The end factor for learning rate scheduler.
use_advnorm: False
use_gae: True # Use GAE trick.
gae_lambda: 0.8 # The GAE lambda.
use_grad_clip: True # Gradient normalization.
grad_clip_norm: 10.0 # The max norm of the gradient.
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm().
running_steps: 2000000
eval_interval: 20000
test_episode: 16
log_dir: "./logs/coma/"
model_dir: "./models/coma/"
agent: "COMA" # the learning algorithms_marl
env_name: "StarCraft2"
env_id: "25m"
env_seed: 1
fps: 15
learner: "COMA_Learner"
policy: "Categorical_COMA_Policy"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
normalize: "LayerNorm"
initialize: "orthogonal"
gain: 0.01
actor_hidden_size: [64, ]
critic_hidden_size: [128, 128]
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True # If to use parameter sharing for all agents' policies.
use_actions_mask: False # If to use actions mask for unavailable actions.
seed: 1
parallels: 8
buffer_size: 8
n_epochs: 1
n_minibatch: 1
learning_rate_actor: 0.0007 # Learning rate of actor.
learning_rate_critic: 0.0007 # Learning rate of critic.
weight_decay: 0 # The steps to decay the greedy epsilon.
start_greedy: 0.5
end_greedy: 0.01
decay_step_greedy: 1000000
sync_frequency: 200
vf_coef: 0.1 # Coefficient factor for critic loss.
ent_coef: 0.01 # Coefficient factor for entropy loss.
gamma: 0.95 # Discount factor.
# tricks
use_linear_lr_decay: False # If to use linear learning rate decay.
end_factor_lr_decay: 0.5 # The end factor for learning rate scheduler.
use_advnorm: False
use_gae: True # Use GAE trick.
gae_lambda: 0.8 # The GAE lambda.
use_grad_clip: True # Gradient normalization.
grad_clip_norm: 10.0 # The max norm of the gradient.
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm().
running_steps: 5000000
eval_interval: 50000
test_episode: 16
log_dir: "./logs/coma/"
model_dir: "./models/coma/"
agent: "COMA" # the learning algorithms_marl
env_name: "StarCraft2"
env_id: "5m_vs_6m"
env_seed: 1
fps: 15
learner: "COMA_Learner"
policy: "Categorical_COMA_Policy"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
normalize: "LayerNorm"
initialize: "orthogonal"
gain: 0.01
actor_hidden_size: [64, ]
critic_hidden_size: [128, 128]
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True # If to use parameter sharing for all agents' policies.
use_actions_mask: False # If to use actions mask for unavailable actions.
seed: 1
parallels: 8
buffer_size: 8
n_epochs: 1
n_minibatch: 1
learning_rate_actor: 0.0007 # Learning rate of actor.
learning_rate_critic: 0.0007 # Learning rate of critic.
weight_decay: 0 # The steps to decay the greedy epsilon.
start_greedy: 0.5
end_greedy: 0.01
decay_step_greedy: 2000000
sync_frequency: 200
vf_coef: 0.1 # Coefficient factor for critic loss.
ent_coef: 0.01 # Coefficient factor for entropy loss.
gamma: 0.95 # Discount factor.
# tricks
use_linear_lr_decay: False # If to use linear learning rate decay.
end_factor_lr_decay: 0.5 # The end factor for learning rate scheduler.
use_advnorm: False
use_gae: True # Use GAE trick.
gae_lambda: 0.8 # The GAE lambda.
use_grad_clip: True # Gradient normalization.
grad_clip_norm: 10.0 # The max norm of the gradient.
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm().
running_steps: 10000000
eval_interval: 100000
test_episode: 16
log_dir: "./logs/coma/"
model_dir: "./models/coma/"
agent: "COMA" # the learning algorithms_marl
env_name: "StarCraft2"
env_id: "8m_vs_9m"
env_seed: 1
fps: 15
learner: "COMA_Learner"
policy: "Categorical_COMA_Policy"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
normalize: "LayerNorm"
initialize: "orthogonal"
gain: 0.01
actor_hidden_size: [64, ]
critic_hidden_size: [128, 128]
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True # If to use parameter sharing for all agents' policies.
use_actions_mask: False # If to use actions mask for unavailable actions.
seed: 1
parallels: 8
buffer_size: 8
n_epochs: 1
n_minibatch: 1
learning_rate_actor: 0.0007 # Learning rate of actor.
learning_rate_critic: 0.0007 # Learning rate of critic.
weight_decay: 0 # The steps to decay the greedy epsilon.
start_greedy: 0.5 # The start value of greedy epsilon.
end_greedy: 0.01 # The end value of greedy epsilon.
decay_step_greedy: 2500000 # The steps to decay the greedy epsilon.
sync_frequency: 200 # The frequency to synchronize target networks.
vf_coef: 0.1 # Coefficient factor for critic loss.
ent_coef: 0.01 # Coefficient factor for entropy loss.
gamma: 0.95 # Discount factor.
# tricks
use_linear_lr_decay: False # If to use linear learning rate decay.
end_factor_lr_decay: 0.5 # The end factor for learning rate scheduler.
use_advnorm: False
use_gae: True # Use GAE trick.
gae_lambda: 0.8 # The GAE lambda.
use_grad_clip: True # Gradient normalization.
grad_clip_norm: 10.0 # The max norm of the gradient.
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm().
running_steps: 10000000
eval_interval: 100000
test_episode: 16
log_dir: "./logs/coma/"
model_dir: "./models/coma/"
agent: "COMA" # the learning algorithms_marl
env_name: "StarCraft2"
env_id: "MMM2"
env_seed: 1
fps: 15
learner: "COMA_Learner"
policy: "Categorical_COMA_Policy"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
normalize: "LayerNorm"
initialize: "orthogonal"
gain: 0.01
actor_hidden_size: [64, ]
critic_hidden_size: [128, 128]
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True # If to use parameter sharing for all agents' policies.
use_actions_mask: False # If to use actions mask for unavailable actions.
seed: 1
parallels: 8
buffer_size: 8
n_epochs: 1
n_minibatch: 1
learning_rate_actor: 0.0007 # Learning rate of actor.
learning_rate_critic: 0.0007 # Learning rate of critic.
weight_decay: 0 # The steps to decay the greedy epsilon.
start_greedy: 0.5
end_greedy: 0.01
decay_step_greedy: 1000000
sync_frequency: 200
vf_coef: 0.1 # Coefficient factor for critic loss.
ent_coef: 0.01 # Coefficient factor for entropy loss.
gamma: 0.95 # Discount factor.
# tricks
use_linear_lr_decay: False # If to use linear learning rate decay.
end_factor_lr_decay: 0.5 # The end factor for learning rate scheduler.
use_advnorm: False
use_gae: True # Use GAE trick.
gae_lambda: 0.8 # The GAE lambda.
use_grad_clip: True # Gradient normalization.
grad_clip_norm: 10.0 # The max norm of the gradient.
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm().
running_steps: 10000000
eval_interval: 100000
test_episode: 16
log_dir: "./logs/coma/"
model_dir: "./models/coma/"
agent: "COMA" # the learning algorithms_marl
env_name: "StarCraft2"
env_id: "corridor"
env_seed: 1
fps: 15
learner: "COMA_Learner"
policy: "Categorical_COMA_Policy"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
normalize: "LayerNorm"
initialize: "orthogonal"
gain: 0.01
actor_hidden_size: [64, ]
critic_hidden_size: [128, 128]
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True # If to use parameter sharing for all agents' policies.
use_actions_mask: False # If to use actions mask for unavailable actions.
seed: 1
parallels: 8
buffer_size: 8
n_epochs: 1
n_minibatch: 1
learning_rate_actor: 0.0007 # Learning rate of actor.
learning_rate_critic: 0.0007 # Learning rate of critic.
weight_decay: 0 # The steps to decay the greedy epsilon.
start_greedy: 0.5
end_greedy: 0.01
decay_step_greedy: 1000000
sync_frequency: 200
vf_coef: 0.1 # Coefficient factor for critic loss.
ent_coef: 0.01 # Coefficient factor for entropy loss.
gamma: 0.99 # Discount factor.
# tricks
use_linear_lr_decay: False # If to use linear learning rate decay.
end_factor_lr_decay: 0.5 # The end factor for learning rate scheduler.
use_advnorm: False
use_gae: True
gae_lambda: 0.8 # The GAE lambda.
use_grad_clip: True # Gradient normalization.
grad_clip_norm: 10.0 # The max norm of the gradient.
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm().
running_steps: 10000000
eval_interval: 100000
test_episode: 16
log_dir: "./logs/coma/"
model_dir: "./models/coma/"
agent: "VDAC"
env_name: "mpe" # Name of the environment.
env_id: "simple_spread_v3"
env_seed: 1
continuous_action: False # Continuous action space or not.
learner: "VDAC_Learner"
policy: "Categorical_MAAC_Policy"
representation: "Basic_MLP"
vectorize: "SubprocVecMultiAgentEnv"
runner: "MARL" # Runner
# recurrent settings for Basic_RNN representation
use_rnn: False # Whether to use recurrent neural networks.
rnn: "GRU" # The type of recurrent layer.
fc_hidden_sizes: [64, 64, 64] # The hidden size of feed forward layer in RNN representation.
recurrent_hidden_size: 64 # The hidden size of the recurrent layer.
N_recurrent_layers: 1 # The number of recurrent layer.
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
normalize: "LayerNorm" # Layer normalization.
initialize: "orthogonal" # Network initializer.
gain: 0.01 # Gain value for network initialization.
representation_hidden_size: [64, ] # A list of hidden units for each layer of Basic_MLP representation networks.
actor_hidden_size: [64, ] # A list of hidden units for each layer of actor network.
critic_hidden_size: [64, ] # A list of hidden units for each layer of critic network.
activation: "relu" # The activation function of each hidden layer.
activation_action: "sigmoid" # The activation function for the last layer of the actor.
use_parameter_sharing: True # If to use parameter sharing for all agents' policies.
use_actions_mask: False # If to use actions mask for unavailable actions.
mixer: "VDN" # choices: VDN (sum), QMIX (monotonic)
hidden_dim_mixing_net: 32 # hidden units of mixing network (when mixer is QMIX)
hidden_dim_hyper_net: 32 # hidden units of hyper network (when mixer is QMIX)
seed: 1 # Random seed.
parallels: 16 # The number of environments to run in parallel.
buffer_size: 32 # Number of the transitions (use_rnn is False), or the episodes (use_rnn is True) in replay buffer.
n_epochs: 1 # Number of epochs to train.
n_minibatch: 1 # Number of minibatch to sample and train. batch_size = buffer_size // n_minibatch.
learning_rate: 0.0005 # Learning rate.
weight_decay: 0 # The steps to decay the greedy epsilon.
vf_coef: 0.1 # Coefficient factor for critic loss.
ent_coef: 0.01 # Coefficient factor for entropy loss.
gamma: 0.99 # Discount factor.
# tricks
use_linear_lr_decay: False # If to use linear learning rate decay.
end_factor_lr_decay: 0.5 # The end factor for learning rate scheduler.
use_global_state: True # If to use global state to replace merged observations.
use_value_clip: False # Limit the value range.
value_clip_range: 0.2 # The value clip range.
use_value_norm: False # Use running mean and std to normalize rewards.
use_huber_loss: False # True: use huber loss; False: use MSE loss.
huber_delta: 10.0 # The threshold at which to change between delta-scaled L1 and L2 loss. (For huber loss).
use_advnorm: False # If to use advantage normalization.
use_gae: True # Use GAE trick.
gae_lambda: 0.8 # The GAE lambda.
use_grad_clip: True # Gradient normalization.
grad_clip_norm: 10.0 # The max norm of the gradient.
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm().
running_steps: 10000000 # The total running steps.
eval_interval: 100000 # The interval between every two trainings.
test_episode: 5 # The episodes to test in each test period.
log_dir: "./logs/vdac/"
model_dir: "./models/vdac/"
agent: "VDAC"
env_name: "StarCraft2"
env_id: "2m_vs_1z"
env_seed: 1
fps: 15
learner: "VDAC_Learner"
policy: "Categorical_MAAC_Policy"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
normalize: "LayerNorm"
initialize: "orthogonal"
gain: 0.01
actor_hidden_size: []
critic_hidden_size: []
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True # If to use parameter sharing for all agents' policies.
use_actions_mask: True # If to use actions mask for unavailable actions.
mixer: "QMIX" # choices: VDN (sum), QMIX (monotonic)
hidden_dim_mixing_net: 32 # hidden units of mixing network
hidden_dim_hyper_net: 64 # hidden units of hyper network
seed: 1
parallels: 8
buffer_size: 8
n_epochs: 1
n_minibatch: 1
learning_rate: 0.0007 # 7e-4
weight_decay: 0
vf_coef: 1.0
ent_coef: 0.01
gamma: 0.99 # discount factor
# tricks
use_linear_lr_decay: False # if use linear learning rate decay
end_factor_lr_decay: 0.5
use_global_state: False # if use global state to replace joint observations
use_value_clip: True # limit the value range
value_clip_range: 0.2
use_value_norm: True # use running mean and std to normalize rewards.
use_huber_loss: True # True: use huber loss; False: use MSE loss.
huber_delta: 10.0
use_advnorm: True # use advantage normalization.
use_gae: True # use GAE trick to calculate returns.
gae_lambda: 0.95
use_grad_clip: True # gradient normalization
grad_clip_norm: 10.0
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
running_steps: 1000000
eval_interval: 10000
test_episode: 16
log_dir: "./logs/vdac/"
model_dir: "./models/vdac/"
agent: "VDAC"
env_name: "StarCraft2"
env_id: "3m"
env_seed: 1
fps: 15
learner: "VDAC_Learner"
policy: "Categorical_MAAC_Policy"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64,]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
normalize: "LayerNorm"
initialize: "orthogonal"
gain: 0.01
actor_hidden_size: []
critic_hidden_size: []
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True # If to use parameter sharing for all agents' policies.
use_actions_mask: True # If to use actions mask for unavailable actions.
mixer: "QMIX" # choices: Independent, VDN (sum), QMIX (monotonic)
hidden_dim_mixing_net: 32 # hidden units of mixing network
hidden_dim_hyper_net: 64 # hidden units of hyper network
seed: 1
parallels: 8
buffer_size: 8
n_epochs: 1
n_minibatch: 1
learning_rate: 0.0007 # 7e-4
weight_decay: 0
vf_coef: 1.0
ent_coef: 0.0
gamma: 0.99 # discount factor
# tricks
use_linear_lr_decay: False # if use linear learning rate decay
end_factor_lr_decay: 0.5
use_global_state: False # if use global state to replace joint observations
use_value_clip: True # limit the value range
value_clip_range: 0.2
use_value_norm: False # use running mean and std to normalize rewards.
use_huber_loss: True # True: use huber loss; False: use MSE loss.
huber_delta: 10.0
use_advnorm: False # use advantage normalization.
use_gae: True # use GAE trick to calculate returns.
gae_lambda: 0.95
use_grad_clip: True # gradient normalization
grad_clip_norm: 10.0
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
running_steps: 1000000
eval_interval: 10000
test_episode: 16
log_dir: "./logs/vdac/"
model_dir: "./models/vdac/"
agent: "VDAC"
env_name: "StarCraft2"
env_id: "8m"
env_seed: 1
fps: 15
learner: "VDAC_Learner"
policy: "Categorical_MAAC_Policy"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
on_policy: True
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
normalize: "LayerNorm"
initialize: "orthogonal"
gain: 0.01
actor_hidden_size: []
critic_hidden_size: []
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True # If to use parameter sharing for all agents' policies.
use_actions_mask: True # If to use actions mask for unavailable actions.
mixer: "QMIX" # choices: VDN (sum), QMIX (monotonic)
hidden_dim_mixing_net: 32 # hidden units of mixing network
hidden_dim_hyper_net: 64 # hidden units of hyper network
seed: 1
parallels: 8
buffer_size: 8
n_epochs: 1
n_minibatch: 1
learning_rate: 0.0007 # 7e-4
weight_decay: 0
vf_coef: 1.0
ent_coef: 0.01
gamma: 0.99 # discount factor
# tricks
use_linear_lr_decay: False # if use linear learning rate decay
end_factor_lr_decay: 0.5
use_global_state: False # if use global state to replace joint observations
use_value_clip: True # limit the value range
value_clip_range: 0.2
use_value_norm: True # use running mean and std to normalize rewards.
use_huber_loss: True # True: use huber loss; False: use MSE loss.
huber_delta: 10.0
use_advnorm: True # use advantage normalization.
use_gae: True # use GAE trick to calculate returns.
gae_lambda: 0.95
use_grad_clip: True # gradient normalization
grad_clip_norm: 10.0
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
running_steps: 1000000 # 1M
eval_interval: 5000
test_episode: 16
log_dir: "./logs/vdac/"
model_dir: "./models/vdac/"
agent: "VDAC"
env_name: "StarCraft2"
env_id: "1c3s5z"
env_seed: 1
fps: 15
learner: "VDAC_Learner"
policy: "Categorical_MAAC_Policy"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
normalize: "LayerNorm"
initialize: "orthogonal"
gain: 0.01
actor_hidden_size: []
critic_hidden_size: []
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True # If to use parameter sharing for all agents' policies.
use_actions_mask: True # If to use actions mask for unavailable actions.
mixer: "QMIX" # choices: VDN (sum), QMIX (monotonic)
hidden_dim_mixing_net: 32 # hidden units of mixing network
hidden_dim_hyper_net: 64 # hidden units of hyper network
seed: 1
parallels: 8
buffer_size: 8
n_epochs: 1
n_minibatch: 1
learning_rate: 0.0007 # 7e-4
weight_decay: 0
vf_coef: 1.0
ent_coef: 0.01
gamma: 0.99 # discount factor
# tricks
use_linear_lr_decay: False # if use linear learning rate decay
end_factor_lr_decay: 0.5
use_global_state: False # if use global state to replace joint observations
use_value_clip: True # limit the value range
value_clip_range: 0.2
use_value_norm: True # use running mean and std to normalize rewards.
use_huber_loss: True # True: use huber loss; False: use MSE loss.
huber_delta: 10.0
use_advnorm: True # use advantage normalization.
use_gae: True # use GAE trick to calculate returns.
gae_lambda: 0.95
use_grad_clip: True # gradient normalization
grad_clip_norm: 10.0
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
running_steps: 2000000
eval_interval: 20000
test_episode: 16
log_dir: "./logs/vdac/"
model_dir: "./models/vdac/"
agent: "VDAC"
env_name: "StarCraft2"
env_id: "2s3z"
env_seed: 1
fps: 15
learner: "VDAC_Learner"
policy: "Categorical_MAAC_Policy"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
normalize: "LayerNorm"
initialize: "orthogonal"
gain: 0.01
actor_hidden_size: []
critic_hidden_size: []
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True # If to use parameter sharing for all agents' policies.
use_actions_mask: True # If to use actions mask for unavailable actions.
mixer: "QMIX" # choices: VDN (sum), QMIX (monotonic)
hidden_dim_mixing_net: 32 # hidden units of mixing network
hidden_dim_hyper_net: 64 # hidden units of hyper network
seed: 1
parallels: 8
buffer_size: 8
n_epochs: 1
n_minibatch: 1
learning_rate: 0.0007 # 7e-4
weight_decay: 0
vf_coef: 1.0
ent_coef: 0.01
gamma: 0.99 # discount factor
# tricks
use_linear_lr_decay: False # if use linear learning rate decay
end_factor_lr_decay: 0.5
use_global_state: False # if use global state to replace joint observations
use_value_clip: True # limit the value range
value_clip_range: 0.2
use_value_norm: True # use running mean and std to normalize rewards.
use_huber_loss: True # True: use huber loss; False: use MSE loss.
huber_delta: 10.0
use_advnorm: True # use advantage normalization.
use_gae: True # use GAE trick to calculate returns.
gae_lambda: 0.95
use_grad_clip: True # gradient normalization
grad_clip_norm: 10.0
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
running_steps: 2000000
eval_interval: 20000
test_episode: 16
log_dir: "./logs/vdac/"
model_dir: "./models/vdac/"
agent: "VDAC"
env_name: "StarCraft2"
env_id: "25m"
env_seed: 1
fps: 15
learner: "VDAC_Learner"
policy: "Categorical_MAAC_Policy"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
on_policy: True
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
normalize: "LayerNorm"
initialize: "orthogonal"
gain: 0.01
actor_hidden_size: []
critic_hidden_size: []
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True # If to use parameter sharing for all agents' policies.
use_actions_mask: True # If to use actions mask for unavailable actions.
mixer: "QMIX" # choices: VDN (sum), QMIX (monotonic)
hidden_dim_mixing_net: 32 # hidden units of mixing network
hidden_dim_hyper_net: 64 # hidden units of hyper network
seed: 1
parallels: 8
buffer_size: 8
n_epochs: 1
n_minibatch: 1
learning_rate: 0.0007 # 7e-4
weight_decay: 0
vf_coef: 1.0
ent_coef: 0.01
gamma: 0.99 # discount factor
# tricks
use_linear_lr_decay: False # if use linear learning rate decay
end_factor_lr_decay: 0.5
use_global_state: False # if use global state to replace joint observations
use_value_clip: True # limit the value range
value_clip_range: 0.2
use_value_norm: True # use running mean and std to normalize rewards.
use_huber_loss: True # True: use huber loss; False: use MSE loss.
huber_delta: 10.0
use_advnorm: True # use advantage normalization.
use_gae: True # use GAE trick to calculate returns.
gae_lambda: 0.95
use_grad_clip: True # gradient normalization
grad_clip_norm: 10.0
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
running_steps: 5000000 # 5M
eval_interval: 25000
test_episode: 16
log_dir: "./logs/vdac/"
model_dir: "./models/vdac/"
agent: "VDAC"
env_name: "StarCraft2"
env_id: "5m_vs_6m"
env_seed: 1
fps: 15
learner: "VDAC_Learner"
policy: "Categorical_MAAC_Policy"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
on_policy: True
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
normalize: "LayerNorm"
initialize: "orthogonal"
gain: 0.01
actor_hidden_size: []
critic_hidden_size: []
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True # If to use parameter sharing for all agents' policies.
use_actions_mask: True # If to use actions mask for unavailable actions.
mixer: "QMIX" # choices: VDN (sum), QMIX (monotonic)
hidden_dim_mixing_net: 32 # hidden units of mixing network
hidden_dim_hyper_net: 64 # hidden units of hyper network
seed: 1
parallels: 8
buffer_size: 8
n_epochs: 1
n_minibatch: 1
learning_rate: 0.0007 # 7e-4
weight_decay: 0
vf_coef: 1.0
ent_coef: 0.01
gamma: 0.99 # discount factor
# tricks
use_linear_lr_decay: False # if use linear learning rate decay
end_factor_lr_decay: 0.5
use_global_state: False # if use global state to replace joint observations
use_value_clip: True # limit the value range
value_clip_range: 0.05
use_value_norm: True # use running mean and std to normalize rewards.
use_huber_loss: True # True: use huber loss; False: use MSE loss.
huber_delta: 10.0
use_advnorm: True # use advantage normalization.
use_gae: True # use GAE trick to calculate returns.
gae_lambda: 0.95
use_grad_clip: True # gradient normalization
grad_clip_norm: 10.0
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
running_steps: 10000000 # 10M
eval_interval: 50000
test_episode: 16
log_dir: "./logs/vdac/"
model_dir: "./models/vdac/"
agent: "VDAC"
env_name: "StarCraft2"
env_id: "8m_vs_9m"
env_seed: 1
fps: 15
learner: "VDAC_Learner"
policy: "Categorical_MAAC_Policy"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
on_policy: True
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
normalize: "LayerNorm"
initialize: "orthogonal"
gain: 0.01
actor_hidden_size: []
critic_hidden_size: []
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True # If to use parameter sharing for all agents' policies.
use_actions_mask: True # If to use actions mask for unavailable actions.
mixer: "QMIX" # choices: VDN (sum), QMIX (monotonic)
hidden_dim_mixing_net: 32 # hidden units of mixing network
hidden_dim_hyper_net: 64 # hidden units of hyper network
seed: 1
parallels: 8
buffer_size: 8
n_epochs: 1
n_minibatch: 1
learning_rate: 0.0007 # 7e-4
weight_decay: 0
vf_coef: 1.0
ent_coef: 0.01
gamma: 0.99 # discount factor
# tricks
use_linear_lr_decay: False # if use linear learning rate decay
end_factor_lr_decay: 0.5
use_global_state: False # if use global state to replace joint observations
use_grad_clip: True # gradient normalization
grad_clip_norm: 10.0
use_value_clip: True # limit the value range
value_clip_range: 0.05
use_value_norm: True # use running mean and std to normalize rewards.
use_huber_loss: True # True: use huber loss; False: use MSE loss.
huber_delta: 10.0
use_advnorm: True # use advantage normalization.
use_gae: True # use GAE trick to calculate returns.
gae_lambda: 0.95
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
running_steps: 10000000 # 10M
eval_interval: 50000
test_episode: 16
log_dir: "./logs/vdac/"
model_dir: "./models/vdac/"
agent: "VDAC"
env_name: "StarCraft2"
env_id: "MMM2"
env_seed: 1
fps: 15
learner: "VDAC_Learner"
policy: "Categorical_MAAC_Policy"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
on_policy: True
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
normalize: "LayerNorm"
initialize: "orthogonal"
gain: 1.0
actor_hidden_size: []
critic_hidden_size: []
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True # If to use parameter sharing for all agents' policies.
use_actions_mask: True # If to use actions mask for unavailable actions.
mixer: "QMIX" # choices: VDN (sum), QMIX (monotonic)
hidden_dim_mixing_net: 32 # hidden units of mixing network
hidden_dim_hyper_net: 64 # hidden units of hyper network
seed: 1
parallels: 8
buffer_size: 8
n_epochs: 1
n_minibatch: 2
learning_rate: 0.0007 # 7e-4
weight_decay: 0
vf_coef: 1.0
ent_coef: 0.01
gamma: 0.99 # discount factor
# tricks
use_linear_lr_decay: False # if use linear learning rate decay
end_factor_lr_decay: 0.5
use_global_state: False # if use global state to replace joint observations
use_value_clip: True # limit the value range
value_clip_range: 0.2
use_value_norm: True # use running mean and std to normalize rewards.
use_huber_loss: True # True: use huber loss; False: use MSE loss.
huber_delta: 10.0
use_advnorm: True # use advantage normalization.
use_gae: True # use GAE trick to calculate returns.
gae_lambda: 0.95
use_grad_clip: True # gradient normalization
grad_clip_norm: 10.0
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
running_steps: 10000000 # 10M
eval_interval: 50000
test_episode: 16
log_dir: "./logs/vdac/"
model_dir: "./models/vdac/"
agent: "VDAC"
env_name: "StarCraft2"
env_id: "corridor"
env_seed: 1
fps: 15
learner: "VDAC_Learner"
policy: "Categorical_MAAC_Policy"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
on_policy: True
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
normalize: "LayerNorm"
initialize: "orthogonal"
gain: 0.01
actor_hidden_size: []
critic_hidden_size: []
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True # If to use parameter sharing for all agents' policies.
use_actions_mask: True # If to use actions mask for unavailable actions.
mixer: "QMIX" # choices: VDN (sum), QMIX (monotonic)
hidden_dim_mixing_net: 32 # hidden units of mixing network
hidden_dim_hyper_net: 64 # hidden units of hyper network
seed: 1
parallels: 8
buffer_size: 8
n_epochs: 1
n_minibatch: 1
learning_rate: 0.0007 # 7e-4
weight_decay: 0
vf_coef: 1.0
ent_coef: 0.01
gamma: 0.99 # discount factor
# tricks
use_linear_lr_decay: False # if use linear learning rate decay
end_factor_lr_decay: 0.5
use_global_state: False # if use global state to replace joint observations
use_value_clip: True # limit the value range
value_clip_range: 0.2
use_value_norm: True # use running mean and std to normalize rewards.
use_huber_loss: True # True: use huber loss; False: use MSE loss.
huber_delta: 10.0
use_advnorm: True # use advantage normalization.
use_gae: True # use GAE trick to calculate returns.
gae_lambda: 0.95
use_grad_clip: True # gradient normalization
grad_clip_norm: 10.0
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
running_steps: 10000000 # 10M
eval_interval: 50000
test_episode: 16
log_dir: "./logs/vdac/"
model_dir: "./models/vdac/"
agent: "IPPO" # The agent name.
env_name: "mpe" # The environment name.
env_id: "simple_spread_v3" # The environment id.
env_seed: 1
continuous_action: True # If to use continuous control.
learner: "IPPO_Learner" # The learner name.
policy: "Gaussian_MAAC_Policy" # The policy name.
representation: "Basic_MLP" # The representation name.
vectorize: "SubprocVecMultiAgentEnv" # The method to vectorize your environment such that can run in parallel.
runner: "MARL" # The runner.
# recurrent settings for Basic_RNN representation.
use_rnn: False # If to use recurrent neural network as representation. (The representation should be "Basic_RNN").
rnn: "GRU" # The type of recurrent layer.
fc_hidden_sizes: [64, 64, 64] # The hidden size of feed forward layer in RNN representation.
recurrent_hidden_size: 64 # The hidden size of the recurrent layer.
N_recurrent_layers: 1 # The number of recurrent layer.
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
normalize: "LayerNorm" # Layer normalization.
initialize: "orthogonal" # Network initializer.
gain: 0.01 # Gain value for network initialization.
representation_hidden_size: [64, ] # A list of hidden units for each layer of Basic_MLP representation networks.
actor_hidden_size: [64, ] # A list of hidden units for each layer of actor network.
critic_hidden_size: [64, ] # A list of hidden units for each layer of critic network.
activation: "relu" # The activation function of each hidden layer.
activation_action: "sigmoid" # The activation function for the last layer of the actor.
use_parameter_sharing: True # If to use parameter sharing for all agents' policies.
use_actions_mask: False # If to use actions mask for unavailable actions.
seed: 1 # Random seed.
parallels: 16 # The number of environments to run in parallel.
buffer_size: 3200 # Number of the transitions (use_rnn is False), or the episodes (use_rnn is True) in replay buffer.
n_epochs: 10 # Number of epochs to train.
n_minibatch: 1 # Number of minibatch to sample and train. batch_size = buffer_size // n_minibatch.
learning_rate: 0.0007 # Learning rate.
weight_decay: 0 # The steps to decay the greedy epsilon.
vf_coef: 0.5 # Coefficient factor for critic loss.
ent_coef: 0.01 # Coefficient factor for entropy loss.
target_kl: 0.25 # For MAPPO_KL learner.
clip_range: 0.2 # The clip range for ratio in MAPPO_Clip learner.
gamma: 0.99 # Discount factor.
# tricks
use_linear_lr_decay: False # If to use linear learning rate decay.
end_factor_lr_decay: 0.5 # The end factor for learning rate scheduler.
use_global_state: False # If to use global state to replace merged observations.
use_value_clip: True # Limit the value range.
value_clip_range: 0.2 # The value clip range.
use_value_norm: True # Use running mean and std to normalize rewards.
use_huber_loss: True # True: use huber loss; False: use MSE loss.
huber_delta: 10.0 # The threshold at which to change between delta-scaled L1 and L2 loss. (For huber loss).
use_advnorm: True # If to use advantage normalization.
use_gae: True # Use GAE trick.
gae_lambda: 0.95 # The GAE lambda.
use_grad_clip: True # Gradient normalization.
grad_clip_norm: 10.0 # The max norm of the gradient.
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm().
running_steps: 10000000 # The total running steps.
eval_interval: 100000 # The interval between every two trainings.
test_episode: 5 # The episodes to test in each test period.
log_dir: "./logs/ippo/"
model_dir: "./models/ippo/"
agent: "IPPO" # the learning algorithms_marl
# environment settings
env_name: "Football"
scenario: "academy_3_vs_1_with_keeper"
env_seed: 1
use_stacked_frames: False # Whether to use stacked_frames
num_agent: 3
num_adversary: 0
obs_type: "simple115v2" # representation used to build the observation, choices: ["simple115v2", "extracted", "pixels_gray", "pixels"]
rewards_type: "scoring,checkpoints" # comma separated list of rewards to be added
smm_width: 96 # width of super minimap
smm_height: 72 # height of super minimap
fps: 15
learner: "IPPO_Learner"
policy: "Categorical_MAAC_Policy"
representation: "Basic_RNN"
vectorize: "Subproc_Football"
runner: "RunnerFootball"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
normalize: "LayerNorm"
initialize: "orthogonal"
gain: 0.01
actor_hidden_size: [64, 64]
critic_hidden_size: [64, 64]
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True # If to use parameter sharing for all agents' policies.
use_actions_mask: False # If to use actions mask for unavailable actions.
seed: 1
parallels: 50
buffer_size: 400
n_epochs: 15
n_minibatch: 2
learning_rate: 5.0e-4
weight_decay: 0
vf_coef: 1.0
ent_coef: 0.01
clip_range: 0.2
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
gamma: 0.99 # discount factor
# tricks
use_linear_lr_decay: False # if use linear learning rate decay
end_factor_lr_decay: 0.5
use_global_state: False # if use global state to calculate values
use_value_clip: True # limit the value range
value_clip_range: 0.2
use_value_norm: True # use running mean and std to normalize rewards.
use_huber_loss: True # True: use huber loss; False: use MSE loss.
huber_delta: 10.0
use_advnorm: True # use advantage normalization.
use_gae: True # use GAE trick to calculate returns.
gae_lambda: 0.95
use_grad_clip: True # gradient normalization
grad_clip_norm: 10.0
running_steps: 25000000
eval_interval: 200000
test_episode: 50
log_dir: "./logs/ippo/"
model_dir: "./models/ippo/"
videos_dir: "./videos/ippo/"
agent: "IPPO"
env_name: "StarCraft2"
env_id: "2m_vs_1z"
env_seed: 1
fps: 15
learner: "IPPO_Learner"
policy: "Categorical_MAAC_Policy"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, 64, 64]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
normalize: "LayerNorm"
initialize: "orthogonal"
gain: 0.01
actor_hidden_size: []
critic_hidden_size: []
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True # If to use parameter sharing for all agents' policies.
use_actions_mask: True # If to use actions mask for unavailable actions.
seed: 1
parallels: 8
buffer_size: 128
n_epochs: 15
n_minibatch: 1
learning_rate: 0.0007 # 7e-4
weight_decay: 0
vf_coef: 1.0
ent_coef: 0.01
target_kl: 0.25
clip_range: 0.2
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
gamma: 0.99 # discount factor
# tricks
use_linear_lr_decay: False # if use linear learning rate decay
end_factor_lr_decay: 0.5
use_global_state: False # if use global state to replace merged observations
use_value_clip: True # limit the value range
value_clip_range: 0.2
use_value_norm: True # use running mean and std to normalize rewards.
use_huber_loss: True # True: use huber loss; False: use MSE loss.
huber_delta: 10.0
use_advnorm: True # use advantage normalization.
use_gae: True # use GAE trick to calculate returns.
gae_lambda: 0.95
use_grad_clip: True # gradient normalization
grad_clip_norm: 10.0
running_steps: 1000000
eval_interval: 10000
test_episode: 16
log_dir: "./logs/ippo/"
model_dir: "./models/ippo/"
agent: "IPPO"
env_name: "StarCraft2"
env_id: "3m"
env_seed: 1
fps: 15
learner: "IPPO_Learner"
policy: "Categorical_MAAC_Policy"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, 64, 64]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
normalize: "LayerNorm"
initialize: "orthogonal"
gain: 0.01
actor_hidden_size: []
critic_hidden_size: []
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True # If to use parameter sharing for all agents' policies.
use_actions_mask: True # If to use actions mask for unavailable actions.
seed: 1
parallels: 8
buffer_size: 128
n_epochs: 15
n_minibatch: 1
learning_rate: 0.0007 # 7e-4
weight_decay: 0
vf_coef: 1.0
ent_coef: 0.01
target_kl: 0.25
clip_range: 0.2
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
gamma: 0.99 # discount factor
# tricks
use_linear_lr_decay: False # if use linear learning rate decay
end_factor_lr_decay: 0.5
use_global_state: False # if use global state to replace joint observations
use_value_clip: True # limit the value range
value_clip_range: 0.2
use_value_norm: True # use running mean and std to normalize rewards.
use_huber_loss: True # True: use huber loss; False: use MSE loss.
huber_delta: 10.0
use_advnorm: True # use advantage normalization.
use_gae: True # use GAE trick to calculate returns.
gae_lambda: 0.95
use_grad_clip: True # gradient normalization
grad_clip_norm: 10.0
running_steps: 1000000
eval_interval: 10000
test_episode: 16
log_dir: "./logs/ippo/"
model_dir: "./models/ippo/"
agent: "IPPO"
env_name: "StarCraft2"
env_id: "8m"
env_seed: 1
fps: 15
learner: "IPPO_Learner"
policy: "Categorical_MAAC_Policy"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, 64, 64]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
normalize: "LayerNorm"
initialize: "orthogonal"
gain: 0.01
actor_hidden_size: []
critic_hidden_size: []
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True # If to use parameter sharing for all agents' policies.
use_actions_mask: True # If to use actions mask for unavailable actions.
seed: 1
parallels: 8
buffer_size: 128
n_epochs: 15
n_minibatch: 1
learning_rate: 0.0007 # 7e-4
weight_decay: 0
vf_coef: 1.0
ent_coef: 0.01
target_kl: 0.25
clip_range: 0.2
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
gamma: 0.99 # discount factor
# tricks
use_linear_lr_decay: False # if use linear learning rate decay
end_factor_lr_decay: 0.5
use_global_state: False # if use global state to replace joint observations
use_value_clip: True # limit the value range
value_clip_range: 0.2
use_value_norm: True # use running mean and std to normalize rewards.
use_huber_loss: True # True: use huber loss; False: use MSE loss.
huber_delta: 10.0
use_advnorm: True # use advantage normalization.
use_gae: True # use GAE trick to calculate returns.
gae_lambda: 0.95
use_grad_clip: True # gradient normalization
grad_clip_norm: 10.0
running_steps: 1000000 # 1M
eval_interval: 10000
test_episode: 16
log_dir: "./logs/ippo/"
model_dir: "./models/ippo/"
agent: "IPPO"
env_name: "StarCraft2"
env_id: "1c3s5z"
env_seed: 1
fps: 15
learner: "IPPO_Learner"
policy: "Categorical_MAAC_Policy"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, 64, 64]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
normalize: "LayerNorm"
initialize: "orthogonal"
gain: 0.01
actor_hidden_size: []
critic_hidden_size: []
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True # If to use parameter sharing for all agents' policies.
use_actions_mask: True # If to use actions mask for unavailable actions.
seed: 1
parallels: 8
buffer_size: 128
n_epochs: 15
n_minibatch: 1
learning_rate: 0.0007 # 7e-4
weight_decay: 0
vf_coef: 1.0
ent_coef: 0.01
target_kl: 0.25
clip_range: 0.2
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
gamma: 0.99 # discount factor
# tricks
use_linear_lr_decay: False # if use linear learning rate decay
end_factor_lr_decay: 0.5
use_global_state: False # if use global state to replace joint observations
use_value_clip: True # limit the value range
value_clip_range: 0.2
use_value_norm: True # use running mean and std to normalize rewards.
use_huber_loss: True # True: use huber loss; False: use MSE loss.
huber_delta: 10.0
use_advnorm: True # use advantage normalization.
use_gae: True # use GAE trick to calculate returns.
gae_lambda: 0.95
use_grad_clip: True # gradient normalization
grad_clip_norm: 10.0
running_steps: 2000000
eval_interval: 20000
test_episode: 16
log_dir: "./logs/ippo/"
model_dir: "./models/ippo/"
agent: "IPPO"
env_name: "StarCraft2"
env_id: "2s3z"
env_seed: 1
fps: 15
learner: "IPPO_Learner"
policy: "Categorical_MAAC_Policy"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, 64, 64]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
normalize: "LayerNorm"
initialize: "orthogonal"
gain: 0.01
actor_hidden_size: []
critic_hidden_size: []
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True # If to use parameter sharing for all agents' policies.
use_actions_mask: True # If to use actions mask for unavailable actions.
seed: 1
parallels: 8
buffer_size: 128
n_epochs: 15
n_minibatch: 1
learning_rate: 0.0007 # 7e-4
weight_decay: 0
vf_coef: 1.0
ent_coef: 0.01
target_kl: 0.25
clip_range: 0.2
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
gamma: 0.99 # discount factor
# tricks
use_linear_lr_decay: False # if use linear learning rate decay
end_factor_lr_decay: 0.5
use_global_state: False # if use global state to replace joint observations
use_value_clip: True # limit the value range
value_clip_range: 0.2
use_value_norm: True # use running mean and std to normalize rewards.
use_huber_loss: True # True: use huber loss; False: use MSE loss.
huber_delta: 10.0
use_advnorm: True # use advantage normalization.
use_gae: True # use GAE trick to calculate returns.
gae_lambda: 0.95
use_grad_clip: True # gradient normalization
grad_clip_norm: 10.0
running_steps: 2000000
eval_interval: 20000
test_episode: 16
log_dir: "./logs/ippo/"
model_dir: "./models/ippo/"
agent: "IPPO"
env_name: "StarCraft2"
env_id: "25m"
env_seed: 1
fps: 15
learner: "IPPO_Learner"
policy: "Categorical_MAAC_Policy"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, 64, 64]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
normalize: "LayerNorm"
initialize: "orthogonal"
gain: 0.01
actor_hidden_size: []
critic_hidden_size: []
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True # If to use parameter sharing for all agents' policies.
use_actions_mask: True # If to use actions mask for unavailable actions.
seed: 1
parallels: 8
buffer_size: 128
n_epochs: 10
n_minibatch: 1
learning_rate: 0.0007 # 7e-4
weight_decay: 0
vf_coef: 1.0
ent_coef: 0.01
target_kl: 0.25
clip_range: 0.2
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
gamma: 0.99 # discount factor
# tricks
use_linear_lr_decay: False # if use linear learning rate decay
end_factor_lr_decay: 0.5
use_global_state: False # if use global state to replace joint observations
use_value_clip: True # limit the value range
value_clip_range: 0.2
use_value_norm: True # use running mean and std to normalize rewards.
use_huber_loss: True # True: use huber loss; False: use MSE loss.
huber_delta: 10.0
use_advnorm: True # use advantage normalization.
use_gae: True # use GAE trick to calculate returns.
gae_lambda: 0.95
use_grad_clip: True # gradient normalization
grad_clip_norm: 10.0
running_steps: 5000000 # 5M
eval_interval: 50000
test_episode: 16
log_dir: "./logs/ippo/"
model_dir: "./models/ippo/"
agent: "IPPO"
env_name: "StarCraft2"
env_id: "5m_vs_6m"
env_seed: 1
fps: 15
learner: "IPPO_Learner"
policy: "Categorical_MAAC_Policy"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, 64, 64]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
normalize: "LayerNorm"
initialize: "orthogonal"
gain: 0.01
actor_hidden_size: []
critic_hidden_size: []
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True # If to use parameter sharing for all agents' policies.
use_actions_mask: True # If to use actions mask for unavailable actions.
seed: 1
parallels: 8
buffer_size: 128
n_epochs: 10
n_minibatch: 1
learning_rate: 0.0007 # 7e-4
weight_decay: 0
vf_coef: 1.0
ent_coef: 0.01
target_kl: 0.25
clip_range: 0.05
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
gamma: 0.99 # discount factor
# tricks
use_linear_lr_decay: False # if use linear learning rate decay
end_factor_lr_decay: 0.5
use_global_state: False # if use global state to replace joint observations
use_value_clip: True # limit the value range
value_clip_range: 0.05
use_value_norm: True # use running mean and std to normalize rewards.
use_huber_loss: True # True: use huber loss; False: use MSE loss.
huber_delta: 10.0
use_advnorm: True # use advantage normalization.
use_gae: True # use GAE trick to calculate returns.
gae_lambda: 0.95
use_grad_clip: True # gradient normalization
grad_clip_norm: 10.0
running_steps: 10000000 # 10M
eval_interval: 100000
test_episode: 16
log_dir: "./logs/ippo/"
model_dir: "./models/ippo/"
agent: "IPPO"
env_name: "StarCraft2"
env_id: "8m_vs_9m"
env_seed: 1
fps: 15
learner: "IPPO_Learner"
policy: "Categorical_MAAC_Policy"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, 64, 64]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
normalize: "LayerNorm"
initialize: "orthogonal"
gain: 0.01
actor_hidden_size: []
critic_hidden_size: []
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True # If to use parameter sharing for all agents' policies.
use_actions_mask: True # If to use actions mask for unavailable actions.
seed: 1
parallels: 8
buffer_size: 128
n_epochs: 15
n_minibatch: 1
learning_rate: 0.0007 # 7e-4
weight_decay: 0
vf_coef: 1.0
ent_coef: 0.01
target_kl: 0.25
clip_range: 0.05
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
gamma: 0.99 # discount factor
# tricks
use_linear_lr_decay: False # if use linear learning rate decay
end_factor_lr_decay: 0.5
use_global_state: False # if use global state to replace joint observations
use_value_clip: True # limit the value range
value_clip_range: 0.05
use_value_norm: True # use running mean and std to normalize rewards.
use_huber_loss: True # True: use huber loss; False: use MSE loss.
huber_delta: 10.0
use_advnorm: True # use advantage normalization.
use_gae: True # use GAE trick to calculate returns.
gae_lambda: 0.95
use_grad_clip: True # gradient normalization
grad_clip_norm: 10.0
running_steps: 10000000 # 10M
eval_interval: 100000
test_episode: 16
log_dir: "./logs/ippo/"
model_dir: "./models/ippo/"
agent: "IPPO"
env_name: "StarCraft2"
env_id: "MMM2"
env_seed: 1
fps: 15
learner: "IPPO_Learner"
policy: "Categorical_MAAC_Policy"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, 64, 64]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
normalize: "LayerNorm"
initialize: "orthogonal"
gain: 1.0
actor_hidden_size: []
critic_hidden_size: []
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True # If to use parameter sharing for all agents' policies.
use_actions_mask: True # If to use actions mask for unavailable actions.
seed: 1
parallels: 8
buffer_size: 128
n_epochs: 5
n_minibatch: 2
learning_rate: 0.0007 # 7e-4
weight_decay: 0
vf_coef: 1.0
ent_coef: 0.01
target_kl: 0.25
clip_range: 0.2
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
gamma: 0.99 # discount factor
# tricks
use_linear_lr_decay: False # if use linear learning rate decay
end_factor_lr_decay: 0.5
use_global_state: False # if use global state to replace joint observations
use_value_clip: True # limit the value range
value_clip_range: 0.2
use_value_norm: True # use running mean and std to normalize rewards.
use_huber_loss: True # True: use huber loss; False: use MSE loss.
huber_delta: 10.0
use_advnorm: True # use advantage normalization.
use_gae: True # use GAE trick to calculate returns.
gae_lambda: 0.95
use_grad_clip: True # gradient normalization
grad_clip_norm: 10.0
running_steps: 10000000 # 10M
eval_interval: 100000
test_episode: 16
log_dir: "./logs/ippo/"
model_dir: "./models/ippo/"
agent: "IPPO"
env_name: "StarCraft2"
env_id: "corridor"
env_seed: 1
fps: 15
learner: "IPPO_Learner"
policy: "Categorical_MAAC_Policy"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, 64, 64]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
normalize: "LayerNorm"
initialize: "orthogonal"
gain: 0.01
actor_hidden_size: []
critic_hidden_size: []
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True # If to use parameter sharing for all agents' policies.
use_actions_mask: True # If to use actions mask for unavailable actions.
seed: 1
parallels: 8
buffer_size: 128
n_epochs: 5
n_minibatch: 1
learning_rate: 0.0007 # 7e-4
weight_decay: 0
vf_coef: 1.0
ent_coef: 0.01
target_kl: 0.25
clip_range: 0.2
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
gamma: 0.99 # discount factor
# tricks
use_linear_lr_decay: False # if use linear learning rate decay
end_factor_lr_decay: 0.5
use_global_state: False # if use global state to replace joint observations
use_value_clip: True # limit the value range
value_clip_range: 0.2
use_value_norm: True # use running mean and std to normalize rewards.
use_huber_loss: True # True: use huber loss; False: use MSE loss.
huber_delta: 10.0
use_advnorm: True # use advantage normalization.
use_gae: True # use GAE trick to calculate returns.
gae_lambda: 0.95
use_grad_clip: True # gradient normalization
grad_clip_norm: 10.0
running_steps: 10000000 # 10M
eval_interval: 100000
test_episode: 16
log_dir: "./logs/ippo/"
model_dir: "./models/ippo/"
agent: "MAPPO" # The agent name.
env_name: "mpe" # The environment name.
env_id: "simple_spread_v3" # The environment id.
env_seed: 1
continuous_action: True # If to use continuous control.
learner: "MAPPO_Clip_Learner"
policy: "Gaussian_MAAC_Policy" # The policy name.
representation: "Basic_MLP" # The representation name.
vectorize: "SubprocVecMultiAgentEnv" # The method to vectorize your environment such that can run in parallel.
runner: "MARL" # The runner.
# recurrent settings for Basic_RNN representation.
use_rnn: False # If to use recurrent neural network as representation. (The representation should be "Basic_RNN").
rnn: "GRU" # The type of recurrent layer.
fc_hidden_sizes: [64, 64, 64] # The hidden size of feed forward layer in RNN representation.
recurrent_hidden_size: 64 # The hidden size of the recurrent layer.
N_recurrent_layers: 1 # The number of recurrent layer.
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
normalize: "LayerNorm" # Layer normalization.
initialize: "orthogonal" # Network initializer.
gain: 0.01
representation_hidden_size: [64, ] # A list of hidden units for each layer of Basic_MLP representation networks.
actor_hidden_size: [64, ] # A list of hidden units for each layer of actor network.
critic_hidden_size: [64, ] # A list of hidden units for each layer of critic network.
activation: "relu" # The activation function of each hidden layer.
activation_action: "sigmoid" # The activation function for the last layer of the actor.
use_parameter_sharing: True # If to use parameter sharing for all agents' policies.
use_actions_mask: False # If to use actions mask for unavailable actions.
seed: 1 # Random seed.
parallels: 16 # The number of environments to run in parallel.
buffer_size: 400 # Number of the transitions (use_rnn is False), or the episodes (use_rnn is True) in replay buffer.
n_epochs: 1 # Number of epochs to train.
n_minibatch: 1 # Number of minibatch to sample and train. batch_size = buffer_size // n_minibatch.
learning_rate: 0.0007 # Learning rate.
weight_decay: 0 # The steps to decay the greedy epsilon.
vf_coef: 0.5 # Coefficient factor for critic loss.
ent_coef: 0.01 # Coefficient factor for entropy loss.
target_kl: 0.25 # For MAPPO_KL learner.
clip_range: 0.2 # Ratio clip range, for MAPPO_Clip learner.
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm().
gamma: 0.95 # Discount factor.
# tricks
use_linear_lr_decay: False # If to use linear learning rate decay.
end_factor_lr_decay: 0.5 # The end factor for learning rate scheduler.
use_global_state: False # If to use global state to replace merged observations.
use_value_clip: True # Limit the value range.
value_clip_range: 0.2 # The value clip range.
use_value_norm: True # Use running mean and std to normalize rewards.
use_huber_loss: True # True: use huber loss; False: use MSE loss.
huber_delta: 10.0 # The threshold at which to change between delta-scaled L1 and L2 loss. (For huber loss).
use_advnorm: True # If to use advantage normalization.
use_gae: True # Use GAE trick.
gae_lambda: 0.95 # The GAE lambda.
use_grad_clip: True # Gradient normalization.
grad_clip_norm: 10.0 # The max norm of the gradient.
running_steps: 10000000 # The total running steps.
eval_interval: 100000 # The interval between every two trainings.
test_episode: 5 # The episodes to test in each test period.
log_dir: "./logs/mappo/"
model_dir: "./models/mappo/"
agent: "MAPPO"
env_name: "mpe" # Name of the environment.
env_id: "simple_push_v3"
env_seed: 1
continuous_action: True # Continuous action space or not.
learner: "MAPPO_Clip_Learner"
policy: "Gaussian_MAAC_Policy"
representation: "Basic_MLP"
vectorize: "DummyVecMultiAgentEnv"
runner: "RunnerCompetition"
# recurrent settings for Basic_RNN representation
use_rnn: False # If to use recurrent neural network as representation. (The representation should be "Basic_RNN").
rnn: "GRU" # The type of recurrent layer.
fc_hidden_sizes: [64, 64, 64] # The hidden size of feed forward layer in RNN representation.
recurrent_hidden_size: 64 # The hidden size of the recurrent layer.
N_recurrent_layers: 1 # The number of recurrent layer.
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
normalize: "LayerNorm" # Layer normalization.
initialize: "orthogonal" # Network initializer.
gain: 0.01
representation_hidden_size: [64, ] # A list of hidden units for each layer of Basic_MLP representation networks.
actor_hidden_size: [64, ]
critic_hidden_size: [256, ]
activation: "relu" # The activation function of each hidden layer.
activation_action: "sigmoid" # The activation function for the last layer of the actor.
use_parameter_sharing: True # If to use parameter sharing for all agents' policies.
use_actions_mask: False # If to use actions mask for unavailable actions.
seed: 1
parallels: 128
buffer_size: 3200
n_epochs: 10
n_minibatch: 1
learning_rate: 0.0007
weight_decay: 0
vf_coef: 0.5
ent_coef: 0.01
target_kl: 0.25 # for MAPPO_KL learner
clip_range: 0.2 # ratio clip range, for MAPPO_Clip learner
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
gamma: 0.95 # discount factor
# tricks
use_linear_lr_decay: False # if use linear learning rate decay
end_factor_lr_decay: 0.5
use_global_state: False # if use global state to replace merged observations
use_value_clip: True # limit the value range
value_clip_range: 0.2
use_value_norm: True # use running mean and std to normalize rewards.
use_huber_loss: True # True: use huber loss; False: use MSE loss.
huber_delta: 10.0
use_advnorm: True
use_gae: True
gae_lambda: 0.95
use_grad_clip: True # gradient normalization
grad_clip_norm: 10.0
running_steps: 10000000
eval_interval: 100000
test_episode: 5
log_dir: "./logs/mappo/"
model_dir: "./models/mappo/"
agent: "MAPPO"
env_name: "mpe" # Name of the environment.
env_id: "simple_adversary_v3"
env_seed: 1
continuous_action: True # Continuous action space or not.
learner: "MAPPO_Clip_Learner"
policy: "Gaussian_MAAC_Policy"
representation: "Basic_MLP"
vectorize: "DummyVecMultiAgentEnv"
runner: "RunnerCompetition"
# recurrent settings for Basic_RNN representation
use_rnn: False # If to use recurrent neural network as representation. (The representation should be "Basic_RNN").
rnn: "GRU" # The type of recurrent layer.
fc_hidden_sizes: [64, 64, 64] # The hidden size of feed forward layer in RNN representation.
recurrent_hidden_size: 64 # The hidden size of the recurrent layer.
N_recurrent_layers: 1 # The number of recurrent layer.
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
normalize: "LayerNorm" # Layer normalization.
initialize: "orthogonal" # Network initializer.
gain: 0.01
representation_hidden_size: [64, ] # A list of hidden units for each layer of Basic_MLP representation networks.
actor_hidden_size: [64, ]
critic_hidden_size: [256, ]
activation: "relu" # The activation function of each hidden layer.
activation_action: "sigmoid" # The activation function for the last layer of the actor.
use_parameter_sharing: True # If to use parameter sharing for all agents' policies.
use_actions_mask: False # If to use actions mask for unavailable actions.
seed: 1
parallels: 128
buffer_size: 3200
n_epochs: 10
n_minibatch: 1
learning_rate: 0.0007
weight_decay: 0
vf_coef: 0.5
ent_coef: 0.01
target_kl: 0.25 # for MAPPO_KL learner
clip_range: 0.2 # ratio clip range, for MAPPO_Clip learner
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
gamma: 0.95 # discount factor
# tricks
use_linear_lr_decay: False # if use linear learning rate decay
end_factor_lr_decay: 0.5
use_global_state: False # if use global state to replace merged observations
use_value_clip: True # limit the value range
value_clip_range: 0.2
use_value_norm: True # use running mean and std to normalize rewards.
use_huber_loss: True # True: use huber loss; False: use MSE loss.
huber_delta: 10.0
use_advnorm: True
use_gae: True
gae_lambda: 0.95
use_grad_clip: True # gradient normalization
grad_clip_norm: 10.0
running_steps: 10000000
eval_interval: 100000
test_episode: 5
log_dir: "./logs/mappo/"
model_dir: "./models/mappo/"
agent: "MAPPO"
env_name: "RoboticWarehouse"
env_id: "rware-tiny-2ag-v1"
env_seed: 1
max_episode_steps: 100
learner: "MAPPO_Clip_Learner"
policy: "Categorical_MAAC_Policy"
representation: "Basic_MLP"
vectorize: "Dummy_RoboticWarehouse"
runner: "MARL" # Runner
# recurrent settings for Basic_RNN representation
use_rnn: False # Whether to use recurrent neural networks.
rnn:
representation_hidden_size: [64, ] # the units for each hidden layer
gain: 0.01
actor_hidden_size: [64, ]
critic_hidden_size: [256, ]
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True # If to use parameter sharing for all agents' policies.
use_actions_mask: False # If to use actions mask for unavailable actions.
seed: 1
parallels: 128
buffer_size: 3200
n_epochs: 10
n_minibatch: 1
learning_rate: 0.0007
weight_decay: 0
vf_coef: 0.5
ent_coef: 0.01
target_kl: 0.25 # for MAPPO_KL learner
clip_range: 0.2 # ratio clip range, for MAPPO_Clip learner
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
gamma: 0.95 # discount factor
# tricks
use_linear_lr_decay: False # if use linear learning rate decay
end_factor_lr_decay: 0.5
use_global_state: False # if use global state to replace merged observations
use_value_clip: True # limit the value range
value_clip_range: 0.2
use_value_norm: False # use running mean and std to normalize rewards.
use_huber_loss: True # True: use huber loss; False: use MSE loss.
huber_delta: 10.0
use_advnorm: True
use_gae: True
gae_lambda: 0.95
use_grad_clip: True # gradient normalization
grad_clip_norm: 10.0
running_steps: 10000000
eval_interval: 100000
test_episode: 5
log_dir: "./logs/mappo/"
model_dir: "./models/mappo/"
agent: "MAPPO" # the learning algorithms_marl
# environment settings
env_name: "Football"
scenario: "1_vs_1_easy"
env_seed: 1
use_stacked_frames: False # Whether to use stacked_frames
num_agent: 1
num_adversary: 0
obs_type: "simple115v2" # representation used to build the observation, choices: ["simple115v2", "extracted", "pixels_gray", "pixels"]
rewards_type: "scoring,checkpoints" # comma separated list of rewards to be added
smm_width: 96 # width of super minimap
smm_height: 72 # height of super minimap
fps: 15
learner: "MAPPO_Clip_Learner"
policy: "Categorical_MAAC_Policy"
representation: "Basic_RNN"
vectorize: "Subproc_Football"
runner: "RunnerFootball"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
normalize: "LayerNorm"
initialize: "orthogonal"
gain: 0.01
actor_hidden_size: [64, 64]
critic_hidden_size: [64, 64]
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True # If to use parameter sharing for all agents' policies.
use_actions_mask: False # If to use actions mask for unavailable actions.
seed: 1
parallels: 50
buffer_size: 100
n_epochs: 10
n_minibatch: 1
learning_rate: 5.0e-4
weight_decay: 0
vf_coef: 1.0
ent_coef: 0.01
clip_range: 0.2
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
gamma: 0.99 # discount factor
# tricks
use_linear_lr_decay: False # if use linear learning rate decay
end_factor_lr_decay: 0.5
use_global_state: True # if use global state to calculate values
use_value_clip: True # limit the value range
value_clip_range: 0.2
use_value_norm: True # use running mean and std to normalize rewards.
use_huber_loss: True # True: use huber loss; False: use MSE loss.
huber_delta: 10.0
use_advnorm: True # use advantage normalization.
use_gae: True # use GAE trick to calculate returns.
gae_lambda: 0.95
use_grad_clip: True # gradient normalization
grad_clip_norm: 10.0
running_steps: 25000000
eval_interval: 200000
test_episode: 50
log_dir: "./logs/mappo/"
model_dir: "./models/mappo/"
videos_dir: "./videos/mappo/"
agent: "MAPPO" # the learning algorithms_marl
# environment settings
env_name: "Football"
scenario: "academy_3_vs_1_with_keeper"
env_seed: 1
use_stacked_frames: False # Whether to use stacked_frames
num_agent: 3
num_adversary: 0
obs_type: "simple115v2" # representation used to build the observation, choices: ["simple115v2", "extracted", "pixels_gray", "pixels"]
rewards_type: "scoring,checkpoints" # comma separated list of rewards to be added
smm_width: 96 # width of super minimap
smm_height: 72 # height of super minimap
fps: 15
learner: "MAPPO_Clip_Learner"
policy: "Categorical_MAAC_Policy"
representation: "Basic_RNN"
vectorize: "Subproc_Football"
runner: "RunnerFootball"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
normalize: "LayerNorm"
initialize: "orthogonal"
gain: 0.01
actor_hidden_size: [64, 64]
critic_hidden_size: [64, 64]
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True # If to use parameter sharing for all agents' policies.
use_actions_mask: False # If to use actions mask for unavailable actions.
seed: 1
parallels: 50
buffer_size: 400
n_epochs: 15
n_minibatch: 2
learning_rate: 5.0e-4
weight_decay: 0
vf_coef: 1.0
ent_coef: 0.01
clip_range: 0.2
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
gamma: 0.99 # discount factor
# tricks
use_linear_lr_decay: False # if use linear learning rate decay
end_factor_lr_decay: 0.5
use_global_state: True # if use global state to calculate values
use_value_clip: True # limit the value range
value_clip_range: 0.2
use_value_norm: True # use running mean and std to normalize rewards.
use_huber_loss: True # True: use huber loss; False: use MSE loss.
huber_delta: 10.0
use_advnorm: True # use advantage normalization.
use_gae: True # use GAE trick to calculate returns.
gae_lambda: 0.95
use_grad_clip: True # gradient normalization
grad_clip_norm: 10.0
running_steps: 25000000
eval_interval: 200000
test_episode: 50
log_dir: "./logs/mappo/"
model_dir: "./models/mappo/"
videos_dir: "./videos/mappo/"
agent: "MAPPO"
env_name: "StarCraft2"
env_id: "2m_vs_1z"
env_seed: 1
fps: 15
learner: "MAPPO_Clip_Learner"
policy: "Categorical_MAAC_Policy"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, 64, 64]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
normalize: "LayerNorm"
initialize: "orthogonal"
gain: 0.01
actor_hidden_size: []
critic_hidden_size: []
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True # If to use parameter sharing for all agents' policies.
use_actions_mask: True # If to use actions mask for unavailable actions.
seed: 1
parallels: 8
buffer_size: 128
n_epochs: 15
n_minibatch: 1
learning_rate: 0.0007 # 7e-4
weight_decay: 0
vf_coef: 1.0
ent_coef: 0.01
target_kl: 0.25
clip_range: 0.2
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
gamma: 0.99 # discount factor
# tricks
use_linear_lr_decay: False # if use linear learning rate decay
end_factor_lr_decay: 0.5
use_global_state: False # if use global state to replace merged observations
use_value_clip: True # limit the value range
value_clip_range: 0.2
use_value_norm: True # use running mean and std to normalize rewards.
use_huber_loss: True # True: use huber loss; False: use MSE loss.
huber_delta: 10.0
use_advnorm: True # use advantage normalization.
use_gae: True # use GAE trick to calculate returns.
gae_lambda: 0.95
use_grad_clip: True # gradient normalization
grad_clip_norm: 10.0
running_steps: 1000000
eval_interval: 10000
test_episode: 16
log_dir: "./logs/mappo/"
model_dir: "./models/mappo/"
agent: "MAPPO"
env_name: "StarCraft2"
env_id: "3m"
env_seed: 1
fps: 15
learner: "MAPPO_Clip_Learner"
policy: "Categorical_MAAC_Policy"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, 64, 64]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
normalize: "LayerNorm"
initialize: "orthogonal"
gain: 0.01
actor_hidden_size: []
critic_hidden_size: []
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True # If to use parameter sharing for all agents' policies.
use_actions_mask: True # If to use actions mask for unavailable actions.
seed: 1
parallels: 8
buffer_size: 128
n_epochs: 15
n_minibatch: 1
learning_rate: 0.0007 # 7e-4
weight_decay: 0
vf_coef: 1.0
ent_coef: 0.01
target_kl: 0.25
clip_range: 0.2
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
gamma: 0.99 # discount factor
# tricks
use_linear_lr_decay: False # if use linear learning rate decay
end_factor_lr_decay: 0.5
use_global_state: False # if use global state to calculate values
use_value_clip: True # limit the value range
value_clip_range: 0.2
use_value_norm: True # use running mean and std to normalize rewards.
use_huber_loss: True # True: use huber loss; False: use MSE loss.
huber_delta: 10.0
use_advnorm: True # use advantage normalization.
use_gae: True # use GAE trick to calculate returns.
gae_lambda: 0.95
use_grad_clip: True # gradient normalization
grad_clip_norm: 10.0
running_steps: 1000000
eval_interval: 10000
test_episode: 16
log_dir: "./logs/mappo/"
model_dir: "./models/mappo/"
agent: "MAPPO"
env_name: "StarCraft2"
env_id: "8m"
env_seed: 1
fps: 15
learner: "MAPPO_Clip_Learner"
policy: "Categorical_MAAC_Policy"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, 64, 64]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
normalize: "LayerNorm"
initialize: "orthogonal"
gain: 0.01
actor_hidden_size: []
critic_hidden_size: []
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True # If to use parameter sharing for all agents' policies.
use_actions_mask: True # If to use actions mask for unavailable actions.
seed: 1
parallels: 8
buffer_size: 128
n_epochs: 15
n_minibatch: 1
learning_rate: 0.0007 # 7e-4
weight_decay: 0
vf_coef: 1.0
ent_coef: 0.01
target_kl: 0.25
clip_range: 0.2
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
gamma: 0.99 # discount factor
# tricks
use_linear_lr_decay: False # if use linear learning rate decay
end_factor_lr_decay: 0.5
use_global_state: False # if use global state to calculate values
use_value_clip: True # limit the value range
value_clip_range: 0.2
use_value_norm: True # use running mean and std to normalize rewards.
use_huber_loss: True # True: use huber loss; False: use MSE loss.
huber_delta: 10.0
use_advnorm: True # use advantage normalization.
use_gae: True # use GAE trick to calculate returns.
gae_lambda: 0.95
use_grad_clip: True # gradient normalization
grad_clip_norm: 10.0
running_steps: 1000000 # 1M
eval_interval: 10000
test_episode: 16
log_dir: "./logs/mappo/"
model_dir: "./models/mappo/"
agent: "MAPPO"
env_name: "StarCraft2"
env_id: "1c3s5z"
env_seed: 1
fps: 15
learner: "MAPPO_Clip_Learner"
policy: "Categorical_MAAC_Policy"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, 64, 64]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
normalize: "LayerNorm"
initialize: "orthogonal"
gain: 0.01
actor_hidden_size: []
critic_hidden_size: []
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True # If to use parameter sharing for all agents' policies.
use_actions_mask: True # If to use actions mask for unavailable actions.
seed: 1
parallels: 8
buffer_size: 128
n_epochs: 15
n_minibatch: 1
learning_rate: 0.0007 # 7e-4
weight_decay: 0
vf_coef: 1.0
ent_coef: 0.01
target_kl: 0.25
clip_range: 0.2
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
gamma: 0.99 # discount factor
# tricks
use_linear_lr_decay: False # if use linear learning rate decay
end_factor_lr_decay: 0.5
use_global_state: False # if use global state to calculate values
use_value_clip: True # limit the value range
value_clip_range: 0.2
use_value_norm: True # use running mean and std to normalize rewards.
use_huber_loss: True # True: use huber loss; False: use MSE loss.
huber_delta: 10.0
use_advnorm: True # use advantage normalization.
use_gae: True # use GAE trick to calculate returns.
gae_lambda: 0.95
use_grad_clip: True # gradient normalization
grad_clip_norm: 10.0
running_steps: 2000000
eval_interval: 20000
test_episode: 16
log_dir: "./logs/mappo/"
model_dir: "./models/mappo/"
agent: "MAPPO"
env_name: "StarCraft2"
env_id: "2s3z"
env_seed: 1
fps: 15
learner: "MAPPO_Clip_Learner"
policy: "Categorical_MAAC_Policy"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, 64, 64]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
normalize: "LayerNorm"
initialize: "orthogonal"
gain: 0.01
actor_hidden_size: []
critic_hidden_size: []
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True # If to use parameter sharing for all agents' policies.
use_actions_mask: True # If to use actions mask for unavailable actions.
seed: 1
parallels: 8
buffer_size: 128
n_epochs: 15
n_minibatch: 1
learning_rate: 0.0007 # 7e-4
weight_decay: 0
vf_coef: 1.0
ent_coef: 0.01
target_kl: 0.25
clip_range: 0.2
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
gamma: 0.99 # discount factor
# tricks
use_linear_lr_decay: False # if use linear learning rate decay
end_factor_lr_decay: 0.5
use_global_state: False # if use global state to calculate values
use_value_clip: True # limit the value range
value_clip_range: 0.2
use_value_norm: True # use running mean and std to normalize rewards.
use_huber_loss: True # True: use huber loss; False: use MSE loss.
huber_delta: 10.0
use_advnorm: True # use advantage normalization.
use_gae: True # use GAE trick to calculate returns.
gae_lambda: 0.95
use_grad_clip: True # gradient normalization
grad_clip_norm: 10.0
running_steps: 2000000
eval_interval: 20000
test_episode: 16
log_dir: "./logs/mappo/"
model_dir: "./models/mappo/"
agent: "MAPPO"
env_name: "StarCraft2"
env_id: "25m"
env_seed: 1
fps: 15
learner: "MAPPO_Clip_Learner"
policy: "Categorical_MAAC_Policy"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, 64, 64]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
normalize: "LayerNorm"
initialize: "orthogonal"
gain: 0.01
actor_hidden_size: []
critic_hidden_size: []
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True # If to use parameter sharing for all agents' policies.
use_actions_mask: True # If to use actions mask for unavailable actions.
seed: 1
parallels: 8
buffer_size: 128
n_epochs: 10
n_minibatch: 1
learning_rate: 0.0007 # 7e-4
weight_decay: 0
vf_coef: 1.0
ent_coef: 0.01
target_kl: 0.25
clip_range: 0.2
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
gamma: 0.99 # discount factor
# tricks
use_linear_lr_decay: False # if use linear learning rate decay
end_factor_lr_decay: 0.5
use_global_state: False # if use global state to calculate values
use_value_clip: True # limit the value range
value_clip_range: 0.2
use_value_norm: True # use running mean and std to normalize rewards.
use_huber_loss: True # True: use huber loss; False: use MSE loss.
huber_delta: 10.0
use_advnorm: True # use advantage normalization.
use_gae: True # use GAE trick to calculate returns.
gae_lambda: 0.95
use_grad_clip: True # gradient normalization
grad_clip_norm: 10.0
running_steps: 5000000 # 5M
eval_interval: 50000
test_episode: 16
log_dir: "./logs/mappo/"
model_dir: "./models/mappo/"
agent: "MAPPO"
env_name: "StarCraft2"
env_id: "5m_vs_6m"
env_seed: 1
fps: 15
learner: "MAPPO_Clip_Learner"
policy: "Categorical_MAAC_Policy"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
normalize: "LayerNorm"
initialize: "orthogonal"
gain: 0.01
actor_hidden_size: [64, 64]
critic_hidden_size: [64, 64]
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True # If to use parameter sharing for all agents' policies.
use_actions_mask: True # If to use actions mask for unavailable actions.
seed: 1
parallels: 8
buffer_size: 128
n_epochs: 10
n_minibatch: 1
learning_rate: 0.0007 # 7e-4
weight_decay: 0
vf_coef: 1.0
ent_coef: 0.01
target_kl: 0.25
clip_range: 0.05
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
gamma: 0.99 # discount factor
# tricks
use_linear_lr_decay: False # if use linear learning rate decay
end_factor_lr_decay: 0.5
use_global_state: False # if use global state to calculate values
use_value_clip: True # limit the value range
value_clip_range: 0.05
use_value_norm: True # use running mean and std to normalize rewards.
use_huber_loss: True # True: use huber loss; False: use MSE loss.
huber_delta: 10.0
use_advnorm: True # use advantage normalization.
use_gae: True # use GAE trick to calculate returns.
gae_lambda: 0.95
use_grad_clip: True # gradient normalization
grad_clip_norm: 10.0
running_steps: 10000000 # 10M
eval_interval: 100000
test_episode: 16
log_dir: "./logs/mappo/"
model_dir: "./models/mappo/"
agent: "MAPPO"
env_name: "StarCraft2"
env_id: "8m_vs_9m"
env_seed: 1
fps: 15
learner: "MAPPO_Clip_Learner"
policy: "Categorical_MAAC_Policy"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64,]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
normalize: "LayerNorm"
initialize: "orthogonal"
gain: 0.01
actor_hidden_size: [64, 64]
critic_hidden_size: [64, 64]
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True # If to use parameter sharing for all agents' policies.
use_actions_mask: True # If to use actions mask for unavailable actions.
seed: 1
parallels: 8
buffer_size: 128
n_epochs: 15
n_minibatch: 1
learning_rate: 0.0007 # 7e-4
weight_decay: 0
vf_coef: 1.0
ent_coef: 0.01
target_kl: 0.25
clip_range: 0.05
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
gamma: 0.99 # discount factor
# tricks
use_linear_lr_decay: False # if use linear learning rate decay
end_factor_lr_decay: 0.5
use_global_state: False # if use global state to calculate values
use_value_clip: True # limit the value range
value_clip_range: 0.05
use_value_norm: True # use running mean and std to normalize rewards.
use_huber_loss: True # True: use huber loss; False: use MSE loss.
huber_delta: 10.0
use_advnorm: True # use advantage normalization.
use_gae: True # use GAE trick to calculate returns.
gae_lambda: 0.95
use_grad_clip: True # gradient normalization
grad_clip_norm: 10.0
running_steps: 10000000 # 10M
eval_interval: 100000
test_episode: 16
log_dir: "./logs/mappo/"
model_dir: "./models/mappo/"
agent: "MAPPO"
env_name: "StarCraft2"
env_id: "MMM2"
env_seed: 1
fps: 15
learner: "MAPPO_Clip_Learner"
policy: "Categorical_MAAC_Policy"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, 64, 64]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
normalize: "LayerNorm"
initialize: "orthogonal"
gain: 1.0
actor_hidden_size: []
critic_hidden_size: []
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True # If to use parameter sharing for all agents' policies.
use_actions_mask: True # If to use actions mask for unavailable actions.
seed: 1
parallels: 8
buffer_size: 128
n_epochs: 5
n_minibatch: 2
learning_rate: 0.0007 # 7e-4
weight_decay: 0
vf_coef: 1.0
ent_coef: 0.01
target_kl: 0.25
clip_range: 0.2
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
gamma: 0.99 # discount factor
# tricks
use_linear_lr_decay: False # if use linear learning rate decay
end_factor_lr_decay: 0.5
use_global_state: False # if use global state to calculate values
use_value_clip: True # limit the value range
value_clip_range: 0.2
use_value_norm: True # use running mean and std to normalize rewards.
use_huber_loss: True # True: use huber loss; False: use MSE loss.
huber_delta: 10.0
use_advnorm: True # use advantage normalization.
use_gae: True # use GAE trick to calculate returns.
gae_lambda: 0.95
use_grad_clip: True # gradient normalization
grad_clip_norm: 10.0
running_steps: 10000000 # 10M
eval_interval: 100000
test_episode: 16
log_dir: "./logs/mappo/"
model_dir: "./models/mappo/"
agent: "MAPPO"
env_name: "StarCraft2"
env_id: "corridor"
env_seed: 1
fps: 15
learner: "MAPPO_Clip_Learner"
policy: "Categorical_MAAC_Policy"
representation: "Basic_RNN"
vectorize: "Subproc_StarCraft2"
runner: "RunnerStarCraft2"
# recurrent settings for Basic_RNN representation
use_rnn: True # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, 64, 64]
recurrent_hidden_size: 64
dropout: 0 # dropout should be a number in range [0, 1], the probability of an element being zeroed.
normalize: "LayerNorm"
initialize: "orthogonal"
gain: 0.01
actor_hidden_size: []
critic_hidden_size: []
activation: "relu" # The activation function of each hidden layer.
use_parameter_sharing: True # If to use parameter sharing for all agents' policies.
use_actions_mask: True # If to use actions mask for unavailable actions.
seed: 1
parallels: 8
buffer_size: 128
n_epochs: 5
n_minibatch: 1
learning_rate: 0.0007 # 7e-4
weight_decay: 0
vf_coef: 1.0
ent_coef: 0.01
target_kl: 0.25
clip_range: 0.2
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
gamma: 0.99 # discount factor
# tricks
use_linear_lr_decay: False # if use linear learning rate decay
end_factor_lr_decay: 0.5
use_global_state: False # if use global state to calculate values
use_value_clip: True # limit the value range
value_clip_range: 0.2
use_value_norm: True # use running mean and std to normalize rewards.
use_huber_loss: True # True: use huber loss; False: use MSE loss.
huber_delta: 10.0
use_advnorm: True # use advantage normalization.
use_gae: True # use GAE trick to calculate returns.
gae_lambda: 0.95
use_grad_clip: True # gradient normalization
grad_clip_norm: 10.0
running_steps: 10000000 # 10M
eval_interval: 100000
test_episode: 16
log_dir: "./logs/mappo/"
model_dir: "./models/mappo/"
agent: "ISAC" # the learning algorithms_marl
env_name: "mpe" # Name of the environment.
env_id: "simple_spread_v3"
env_seed: 1
continuous_action: True # Continuous action space or not.
learner: "ISAC_Learner"
policy: "Gaussian_ISAC_Policy"
representation: "Basic_Identical"
vectorize: "DummyVecMultiAgentEnv"
runner: "MARL" # Runner
representation_hidden_size: [] # the units for each hidden layer
actor_hidden_size: [64, 64]
critic_hidden_size: [64, 64]
activation: 'leaky_relu'
activation_action: 'sigmoid'
use_parameter_sharing: True
use_actions_mask: False
seed: 1
parallels: 16
buffer_size: 100000
batch_size: 256
learning_rate_actor: 0.01 # learning rate for actor
learning_rate_critic: 0.001 # learning rate for critic
gamma: 0.95 # discount factor
tau: 0.001 # soft update for target networks
alpha: 0.01
use_automatic_entropy_tuning: True
start_training: 1000 # start training after n steps
running_steps: 10000000
training_frequency: 25
use_grad_clip: False
grad_clip_norm: 10
eval_interval: 100000
test_episode: 5
log_dir: "./logs/isac/"
model_dir: "./models/isac/"
agent: "ISAC" # the learning algorithms_marl
env_name: "mpe" # Name of the environment.
env_id: "simple_push_v3"
env_seed: 1
continuous_action: True # Continuous action space or not.
learner: "ISAC_Learner"
policy: "Gaussian_ISAC_Policy"
representation: "Basic_Identical"
vectorize: "DummyVecMultiAgentEnv"
runner: "RunnerPettingzoo"
representation_hidden_size: [] # the units for each hidden layer
actor_hidden_size: [64, 64]
critic_hidden_size: [64, 64]
activation: 'leaky_relu'
activation_action: 'sigmoid'
seed: 1
parallels: 16
buffer_size: 100000
batch_size: 256
learning_rate_actor: 0.01 # learning rate for actor
learning_rate_critic: 0.001 # learning rate for critic
gamma: 0.95 # discount factor
tau: 0.001 # soft update for target networks
alpha: 0.01
use_automatic_entropy_tuning: True
start_noise: 1.0
end_noise: 0.01
sigma: 0.1 # random noise for continuous actions
start_training: 1000 # start training after n steps
running_steps: 10000000
train_per_step: False # True: train model per step; False: train model per episode.
training_frequency: 1
use_grad_clip: True
grad_clip_norm: 0.5
eval_interval: 100000
test_episode: 5
log_dir: "./logs/isac/"
model_dir: "./models/isac/"
agent: "ISAC" # the learning algorithms_marl
env_name: "mpe" # Name of the environment.
env_id: "simple_adversary_v3"
env_seed: 1
continuous_action: True # Continuous action space or not.
learner: "ISAC_Learner"
policy: "Gaussian_ISAC_Policy"
representation: "Basic_Identical"
vectorize: "DummyVecMultiAgentEnv"
runner: "RunnerPettingzoo"
representation_hidden_size: [] # the units for each hidden layer
actor_hidden_size: [64, 64]
critic_hidden_size: [64, 64]
activation: 'leaky_relu'
activation_action: 'sigmoid'
seed: 1
parallels: 16
buffer_size: 100000
batch_size: 256
learning_rate_actor: 0.01 # learning rate for actor
learning_rate_critic: 0.001 # learning rate for critic
gamma: 0.95 # discount factor
tau: 0.001 # soft update for target networks
alpha: 0.01
use_automatic_entropy_tuning: True
start_noise: 1.0
end_noise: 0.01
sigma: 0.1 # random noise for continuous actions
start_training: 1000 # start training after n steps
running_steps: 10000000
train_per_step: False # True: train model per step; False: train model per episode.
training_frequency: 1
use_grad_clip: True
grad_clip_norm: 0.5
eval_interval: 100000
test_episode: 5
log_dir: "./logs/isac/"
model_dir: "./models/isac/"
agent: "MASAC" # the learning algorithms_marl
env_name: "mpe" # Name of the environment.
env_id: "simple_spread_v3"
env_seed: 1
continuous_action: True # Continuous action space or not.
learner: "MASAC_Learner"
policy: "Gaussian_MASAC_Policy"
representation: "Basic_Identical"
vectorize: "DummyVecMultiAgentEnv"
runner: "MARL" # Runner
representation_hidden_size: [] # the units for each hidden layer
actor_hidden_size: [64, 64]
critic_hidden_size: [64, 64]
activation: 'leaky_relu'
activation_action: 'sigmoid'
use_parameter_sharing: False
use_actions_mask: False
seed: 1
parallels: 16
buffer_size: 100000
batch_size: 256
learning_rate_actor: 0.01 # learning rate for actor
learning_rate_critic: 0.001 # learning rate for critic
gamma: 0.95 # discount factor
tau: 0.001 # soft update for target networks
alpha: 0.01
use_automatic_entropy_tuning: True
start_training: 1000 # start training after n steps
running_steps: 10000000
training_frequency: 25
use_grad_clip: False
grad_clip_norm: 10
eval_interval: 100000
test_episode: 5
log_dir: "./logs/masac/"
model_dir: "./models/masac/"
agent: "MASAC" # the learning algorithms_marl
env_name: "mpe" # Name of the environment.
env_id: "simple_push_v3"
env_seed: 1
continuous_action: True # Continuous action space or not.
learner: "MASAC_Learner"
policy: "Gaussian_MASAC_Policy"
representation: "Basic_Identical"
vectorize: "DummyVecMultiAgentEnv"
runner: "RunnerCompetition"
representation_hidden_size: [] # the units for each hidden layer
actor_hidden_size: [64, 64]
critic_hidden_size: [64, 64]
activation: 'leaky_relu'
activation_action: 'sigmoid'
use_parameter_sharing: False
use_actions_mask: False
seed: 1
parallels: 16
buffer_size: 100000
batch_size: 256
learning_rate_actor: 0.01 # learning rate for actor
learning_rate_critic: 0.001 # learning rate for critic
gamma: 0.95 # discount factor
tau: 0.001 # soft update for target networks
alpha: 0.01
use_automatic_entropy_tuning: True
start_training: 1000 # start training after n steps
running_steps: 10000000
training_frequency: 25
use_grad_clip: True
grad_clip_norm: 0.5
eval_interval: 100000
test_episode: 5
log_dir: "./logs/masac/"
model_dir: "./models/masac/"
agent: "MASAC" # the learning algorithms_marl
env_name: "mpe" # Name of the environment.
env_id: "simple_adversary_v3"
env_seed: 1
continuous_action: True # Continuous action space or not.
learner: "MASAC_Learner"
policy: "Gaussian_MASAC_Policy"
representation: "Basic_Identical"
vectorize: "DummyVecMultiAgentEnv"
runner: "RunnerCompetition"
representation_hidden_size: [] # the units for each hidden layer
actor_hidden_size: [64, 64]
critic_hidden_size: [64, 64]
activation: 'leaky_relu'
activation_action: 'sigmoid'
use_parameter_sharing: False
use_actions_mask: False
seed: 1
parallels: 16
buffer_size: 100000
batch_size: 256
learning_rate_actor: 0.01 # learning rate for actor
learning_rate_critic: 0.001 # learning rate for critic
gamma: 0.95 # discount factor
tau: 0.001 # soft update for target networks
alpha: 0.01
use_automatic_entropy_tuning: True
start_training: 1000 # start training after n steps
running_steps: 10000000
training_frequency: 25
use_grad_clip: True
grad_clip_norm: 0.5
eval_interval: 100000
test_episode: 5
log_dir: "./logs/masac/"
model_dir: "./models/masac/"
agent: "IDDPG" # the learning algorithms_marl
env_name: "mpe" # Name of the environment.
env_id: "simple_spread_v3"
env_seed: 1
continuous_action: True # Continuous action space or not.
learner: "IDDPG_Learner"
policy: "Independent_DDPG_Policy"
representation: "Basic_Identical"
vectorize: "DummyVecMultiAgentEnv"
runner: "MARL" # Runner
representation_hidden_size: [] # the units for each hidden layer
actor_hidden_size: [64, 64]
critic_hidden_size: [64, 64]
activation: 'leaky_relu'
activation_action: 'sigmoid'
use_parameter_sharing: True
use_actions_mask: False
seed: 1
parallels: 16
buffer_size: 100000
batch_size: 256
learning_rate_actor: 0.01 # learning rate for actor
learning_rate_critic: 0.001 # learning rate for critic
gamma: 0.95 # discount factor
tau: 0.001 # soft update for target networks
start_noise: 1.0
end_noise: 0.01
sigma: 0.1 # random noise for continuous actions
start_training: 1000 # start training after n steps
running_steps: 10000000
training_frequency: 25
use_grad_clip: False
grad_clip_norm: 0.5
eval_interval: 100000
test_episode: 5
log_dir: "./logs/iddpg/"
model_dir: "./models/iddpg/"
agent: "IDDPG" # the learning algorithms_marl
env_name: "mpe" # Name of the environment.
env_id: "simple_push_v3"
env_seed: 1
continuous_action: True # Continuous action space or not.
learner: "IDDPG_Learner"
policy: "Independent_DDPG_Policy"
representation: "Basic_Identical"
vectorize: "DummyVecMultiAgentEnv"
runner: "RunnerCompetition"
representation_hidden_size: [] # the units for each hidden layer
actor_hidden_size: [64, 64]
critic_hidden_size: [64, 64]
activation: 'leaky_relu'
activation_action: 'sigmoid'
use_parameter_sharing: True
use_actions_mask: False
seed: 1
parallels: 16
buffer_size: 100000
batch_size: 256
learning_rate_actor: 0.01 # learning rate for actor
learning_rate_critic: 0.001 # learning rate for critic
gamma: 0.95 # discount factor
tau: 0.001 # soft update for target networks
start_noise: 1.0
end_noise: 0.01
sigma: 0.1 # random noise for continuous actions
start_training: 1000 # start training after n steps
running_steps: 10000000
training_frequency: 25
use_grad_clip: True
grad_clip_norm: 0.5
eval_interval: 100000
test_episode: 5
log_dir: "./logs/iddpg/"
model_dir: "./models/iddpg/"
agent: "IDDPG" # the learning algorithms_marl
env_name: "mpe" # Name of the environment.
env_id: "simple_adversary_v3"
env_seed: 1
continuous_action: True # Continuous action space or not.
learner: "IDDPG_Learner"
policy: "Independent_DDPG_Policy"
representation: "Basic_Identical"
vectorize: "DummyVecMultiAgentEnv"
runner: "RunnerCompetition"
representation_hidden_size: [] # the units for each hidden layer
actor_hidden_size: [64, 64]
critic_hidden_size: [64, 64]
activation: 'leaky_relu'
activation_action: 'sigmoid'
use_parameter_sharing: True
use_actions_mask: False
seed: 1
parallels: 16
buffer_size: 100000
batch_size: 256
learning_rate_actor: 0.01 # learning rate for actor
learning_rate_critic: 0.001 # learning rate for critic
gamma: 0.95 # discount factor
tau: 0.001 # soft update for target networks
start_noise: 1.0
end_noise: 0.01
sigma: 0.1 # random noise for continuous actions
start_training: 1000 # start training after n steps
running_steps: 10000000
training_frequency: 25
use_grad_clip: True
grad_clip_norm: 0.5
eval_interval: 100000
test_episode: 5
log_dir: "./logs/iddpg/"
model_dir: "./models/iddpg/"
agent: "IDDPG" # the learning algorithms_marl
env_name: "mpe" # Name of the environment.
env_id: "simple_reference_v3"
env_seed: 1
continuous_action: True # Continuous action space or not.
learner: "IDDPG_Learner"
policy: "Independent_DDPG_Policy"
representation: "Basic_Identical"
vectorize: "DummyVecMultiAgentEnv"
runner: "RunnerCompetition"
representation_hidden_size: [] # the units for each hidden layer
actor_hidden_size: [64, 64]
critic_hidden_size: [64, 64]
activation: 'leaky_relu'
activation_action: 'sigmoid'
use_parameter_sharing: True
use_actions_mask: False
seed: 1
parallels: 16
buffer_size: 100000
batch_size: 256
learning_rate_actor: 0.01 # learning rate for actor
learning_rate_critic: 0.001 # learning rate for critic
gamma: 0.95 # discount factor
tau: 0.001 # soft update for target networks
start_noise: 1.0
end_noise: 0.01
sigma: 0.1 # random noise for continuous actions
start_training: 1000 # start training after n steps
running_steps: 10000000
training_frequency: 25
use_grad_clip: True
grad_clip_norm: 0.5
eval_interval: 100000
test_episode: 5
log_dir: "./logs/iddpg/"
model_dir: "./models/iddpg/"
agent: "IDDPG" # the learning algorithms_marl
env_name: "Drones"
env_id: "MultiHoverAviary"
env_seed: 1
obs_type: 'kin'
act_type: 'vel'
num_drones: 3
record: False
obstacles: True
max_episode_steps: 2000
render: False
sleep: 0.01
learner: "IDDPG_Learner"
policy: "Independent_DDPG_Policy"
representation: "Basic_Identical"
vectorize: "DummyVecMultiAgentEnv"
runner: "MARL" # Runner
actor_hidden_size: [64, 64]
critic_hidden_size: [64, 64]
activation: 'leaky_relu'
activation_action: 'tanh'
use_parameter_sharing: True
seed: 1
parallels: 10
buffer_size: 1000000
batch_size: 1024
learning_rate_actor: 0.001 # learning rate for actor
learning_rate_critic: 0.001 # learning rate for critic
gamma: 0.99 # discount factor
tau: 0.005 # soft update for target networks
start_noise: 0.1
end_noise: 0.1
sigma: 0.1
start_training: 2000 # start training after n steps
running_steps: 10000000
train_per_step: True # True: train model per step; False: train model per episode.
training_frequency: 1
use_grad_clip: True
grad_clip_norm: 0.5
eval_interval: 100000
test_episode: 5
log_dir: "./logs/iddpg/"
model_dir: "./models/iddpg/"
agent: "MADDPG" # the learning algorithms_marl
env_name: "mpe" # Name of the environment.
env_id: "simple_spread_v3"
env_seed: 1
continuous_action: True # Continuous action space or not.
learner: "MADDPG_Learner"
policy: "MADDPG_Policy"
representation: "Basic_Identical"
vectorize: "SubprocVecMultiAgentEnv"
runner: "MARL" # Runner
representation_hidden_size: [] # the units for each hidden layer
actor_hidden_size: [64, 64]
critic_hidden_size: [64, 64]
activation: 'leaky_relu'
activation_action: 'sigmoid'
use_parameter_sharing: True
use_actions_mask: False
seed: 1
parallels: 16
buffer_size: 100000
batch_size: 256
learning_rate_actor: 0.01 # learning rate for actor
learning_rate_critic: 0.001 # learning rate for critic
gamma: 0.95 # discount factor
tau: 0.001 # soft update for target networks
start_noise: 1.0
end_noise: 0.01
sigma: 0.1 # random noise for continuous actions
start_training: 1000 # start training after n steps
running_steps: 10000000
training_frequency: 25
use_grad_clip: False
grad_clip_norm: 0.5
eval_interval: 100000
test_episode: 5
log_dir: "./logs/maddpg/"
model_dir: "./models/maddpg/"
agent: "MADDPG" # the learning algorithms_marl
env_name: "mpe" # Name of the environment.
env_id: "simple_push_v3"
env_seed: 1
continuous_action: True # Continuous action space or not.
learner: "MADDPG_Learner"
policy: "MADDPG_Policy"
representation: "Basic_Identical"
vectorize: "SubprocVecMultiAgentEnv"
runner: "RunnerCompetition"
representation_hidden_size: [] # the units for each hidden layer
actor_hidden_size: [64, 64]
critic_hidden_size: [64, 64]
activation: 'leaky_relu'
activation_action: 'sigmoid'
use_parameter_sharing: True
use_actions_mask: False
seed: 1
parallels: 16
buffer_size: 100000
batch_size: 256
learning_rate_actor: 0.01 # learning rate for actor
learning_rate_critic: 0.001 # learning rate for critic
gamma: 0.95 # discount factor
tau: 0.001 # soft update for target networks
start_noise: 1.0
end_noise: 0.01
sigma: 0.1 # random noise for continuous actions
start_training: 1000 # start training after n steps
running_steps: 10000000
training_frequency: 25
use_grad_clip: False
grad_clip_norm: 0.5
eval_interval: 100000
test_episode: 5
log_dir: "./logs/maddpg/"
model_dir: "./models/maddpg/"
agent: "MADDPG" # the learning algorithms_marl
env_name: "mpe" # Name of the environment.
env_id: "simple_adversary_v3"
env_seed: 1
continuous_action: True # Continuous action space or not.
learner: "MADDPG_Learner"
policy: "MADDPG_Policy"
representation: "Basic_Identical"
vectorize: "DummyVecMultiAgentEnv"
runner: "RunnerCompetition"
representation_hidden_size: [] # the units for each hidden layer
actor_hidden_size: [64, ]
critic_hidden_size: [64, ]
activation: 'leaky_relu'
activation_action: 'sigmoid'
use_parameter_sharing: True
use_actions_mask: False
seed: 1
parallels: 16
buffer_size: 100000
batch_size: 256
learning_rate_actor: 0.01 # learning rate for actor
learning_rate_critic: 0.001 # learning rate for critic
gamma: 0.95 # discount factor
tau: 0.001 # soft update for target networks
start_noise: 1.0
end_noise: 0.01
sigma: 0.1
start_training: 1000 # start training after n steps
running_steps: 10000000
training_frequency: 25
use_grad_clip: True
grad_clip_norm: 0.5
eval_interval: 100000
test_episode: 5
log_dir: "./logs/maddpg/"
model_dir: "./models/maddpg/"
agent: "MADDPG" # the learning algorithms_marl
env_name: "Drones"
env_id: "MultiHoverAviary"
env_seed: 1
obs_type: 'kin'
act_type: 'vel'
num_drones: 3
record: False
obstacles: True
max_episode_steps: 1000
render: False
sleep: 0.01
learner: "MADDPG_Learner"
policy: "MADDPG_Policy"
representation: "Basic_Identical"
vectorize: "Dummy_Drone_MAS"
runner: "MARL" # Runner
on_policy: False
actor_hidden_size: [256, 256]
critic_hidden_size: [256, 256]
activation: 'leaky_relu'
activation_action: 'tanh'
seed: 1
parallels: 10
buffer_size: 1000000
batch_size: 1024
learning_rate_actor: 0.001 # learning rate for actor
learning_rate_critic: 0.001 # learning rate for critic
gamma: 0.99 # discount factor
tau: 0.005 # soft update for target networks
start_noise: 0.1
end_noise: 0.1
sigma: 0.1
start_training: 2000 # start training after n steps
running_steps: 10000000
train_per_step: True # True: train model per step; False: train model per episode.
training_frequency: 1
use_grad_clip: True
grad_clip_norm: 0.5
eval_interval: 100000
test_episode: 5
log_dir: "./logs/maddpg/"
model_dir: "./models/maddpg/"
agent: "MADDPG" # the learning algorithms_marl
env_name: "NewEnv_MAS"
env_id: "scenarios_0"
env_seed: 1
max_episode_steps: 200
render: False
sleep: 0.01
continuous_action: True # Continuous action space or not.
learner: "MADDPG_Learner"
policy: "MADDPG_Policy"
representation: "Basic_Identical"
vectorize: "Dummy_NewEnv_MAS"
runner: "MARL" # Runner
on_policy: False
actor_hidden_size: [64, 64]
critic_hidden_size: [64, 64]
activation: 'leaky_relu'
activation_action: 'tanh'
seed: 1
parallels: 16
buffer_size: 1000000
batch_size: 1024
learning_rate_actor: 0.001 # learning rate for actor
learning_rate_critic: 0.001 # learning rate for critic
gamma: 0.99 # discount factor
tau: 0.005 # soft update for target networks
start_noise: 0.1
end_noise: 0.1
sigma: 0.1
start_training: 2000 # start training after n steps
running_steps: 1000000
train_per_step: True # True: train model per step; False: train model per episode.
training_frequency: 1
use_grad_clip: True
grad_clip_norm: 0.5
eval_interval: 10000
test_episode: 5
log_dir: "./logs/maddpg/"
model_dir: "./models/maddpg/"
agent: "MATD3" # the learning algorithms_marl
env_name: "mpe" # Name of the environment.
env_id: "simple_spread_v3"
env_seed: 1
continuous_action: True # Continuous action space or not.
learner: "MATD3_Learner"
policy: "MATD3_Policy"
representation: "Basic_Identical"
vectorize: "DummyVecMultiAgentEnv"
runner: "MARL" # Runner
representation_hidden_size: [] # the units for each hidden layer
actor_hidden_size: [64, 64]
critic_hidden_size: [64, 64]
activation: 'leaky_relu'
activation_action: 'sigmoid'
use_parameter_sharing: True
use_actions_mask: False
seed: 1
parallels: 16
buffer_size: 100000
batch_size: 256
learning_rate_actor: 0.01 # learning rate for actor
learning_rate_critic: 0.001 # learning rate for critic
gamma: 0.95 # discount factor
tau: 0.001 # soft update for target networks
actor_update_delay: 2
start_noise: 1.0
end_noise: 0.01
sigma: 0.1 # random noise for continuous actions
start_training: 1000 # start training after n steps
running_steps: 10000000
training_frequency: 25
use_grad_clip: False
grad_clip_norm: 0.5
eval_interval: 100000
test_episode: 5
log_dir: "./logs/matd3/"
model_dir: "./models/matd3/"
agent: "MATD3" # the learning algorithms_marl
env_name: "mpe" # Name of the environment.
env_id: "simple_push_v3"
env_seed: 1
continuous_action: True # Continuous action space or not.
learner: "MATD3_Learner"
policy: "MATD3_Policy"
representation: "Basic_Identical"
vectorize: "DummyVecMultiAgentEnv"
runner: "RunnerCompetition"
representation_hidden_size: [] # the units for each hidden layer
actor_hidden_size: [64, 64]
critic_hidden_size: [64, 64]
activation: 'leaky_relu'
activation_action: 'sigmoid'
use_parameter_sharing: True
use_actions_mask: False
seed: 1
parallels: 16
buffer_size: 100000
batch_size: 256
learning_rate_actor: 0.01 # learning rate for actor
learning_rate_critic: 0.001 # learning rate for critic
gamma: 0.95 # discount factor
tau: 0.001 # soft update for target networks
actor_update_delay: 2
start_noise: 1.0
end_noise: 0.01
sigma: 0.1
start_training: 1000 # start training after n steps
running_steps: 10000000
training_frequency: 25
use_grad_clip: True
grad_clip_norm: 0.5
eval_interval: 100000
test_episode: 5
log_dir: "./logs/matd3/"
model_dir: "./models/matd3/"
agent: "MATD3" # the learning algorithms_marl
env_name: "mpe" # Name of the environment.
env_id: "simple_adversary_v3"
env_seed: 1
continuous_action: True # Continuous action space or not.
learner: "MATD3_Learner"
policy: "MATD3_Policy"
representation: "Basic_Identical"
vectorize: "DummyVecMultiAgentEnv"
runner: "RunnerCompetition"
representation_hidden_size: [] # the units for each hidden layer
actor_hidden_size: [64, 64]
critic_hidden_size: [64, 64]
activation: 'leaky_relu'
activation_action: 'sigmoid'
use_parameter_sharing: True
use_actions_mask: False
seed: 1
parallels: 16
buffer_size: 100000
batch_size: 256
learning_rate_actor: 0.01 # learning rate for actor
learning_rate_critic: 0.001 # learning rate for critic
gamma: 0.95 # discount factor
tau: 0.001 # soft update for target networks
actor_update_delay: 2
start_noise: 1.0
end_noise: 0.01
sigma: 0.1
start_training: 1000 # start training after n steps
running_steps: 10000000
training_frequency: 25
use_grad_clip: True
grad_clip_norm: 0.5
eval_interval: 100000
test_episode: 5
log_dir: "./logs/matd3/"
model_dir: "./models/matd3/"
agent: "MFQ" # the learning algorithms_marl
env_name: "mpe" # Name of the environment.
env_id: "simple_spread_v3"
env_seed: 1
continuous_action: False # Continuous action space or not.
learner: "MFQ_Learner"
policy: "MF_Q_network"
representation: "Basic_Identical"
vectorize: "DummyVecMultiAgentEnv"
runner: "RunnerPettingzoo"
use_rnn: False # Whether to use recurrent neural networks.
rnn:
representation_hidden_size: [64, ]
q_hidden_size: [64, ] # the units for each hidden layer
activation: "relu" # The activation function of each hidden layer.
seed: 1
parallels: 16
buffer_size: 200000
batch_size: 256
learning_rate: 0.001
gamma: 0.95 # discount factor
double_q: True # use double q learning
temperature: 0.1 # softmax for policy
start_greedy: 1.0
end_greedy: 0.05
decay_step_greedy: 2500000
start_training: 1000 # start training after n steps
running_steps: 10000000 # 10M
train_per_step: False # True: train model per step; False: train model per episode.
training_frequency: 1
sync_frequency: 100
use_grad_clip: False
grad_clip_norm: 0.5
n_tests: 5
test_period: 100
eval_interval: 100000
test_episode: 5
log_dir: "./logs/mfq/"
model_dir: "./models/mfq/"
agent: "MFQ" # the learning algorithms_marl
env_name: "MAgent2"
env_id: "adversarial_pursuit_v4"
env_seed: 1
minimap_mode: False
max_cycles: 500
extra_features: False
map_size: 45
render_mode: "rgb_array"
learner: "MFQ_Learner"
policy: "MF_Q_network"
representation: "Basic_MLP"
vectorize: "Dummy_MAgent"
runner: "RunnerMAgent"
on_policy: False
# recurrent settings for Basic_RNN representation
use_rnn: False # Whether to use recurrent neural networks.
rnn: "GRU" # Choice of recurrent networks: GRU or LSTM.
N_recurrent_layers: 1 # Number of recurrent layers.
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
representation_hidden_size: [512, ] # the units for each hidden layer
q_hidden_size: [512, ]
activation: "relu" # The activation function of each hidden layer.
seed: 1
parallels: 10
buffer_size: 2000
batch_size: 256
learning_rate: 0.001
gamma: 0.95 # discount factor
temperature: 0.1 # softmax for policy
start_greedy: 0.0
end_greedy: 0.95
decay_step_greedy: 5000
start_training: 1000 # start training after n steps
running_steps: 1000000
train_per_step: False # True: train model per step; False: train model per episode.
training_frequency: 1
sync_frequency: 200
n_tests: 5
test_episodes: 10
eval_interval: 10000
test_episode: 5
log_dir: "./logs/mfq/"
model_dir: "./models/mfq/"
agent: "MFAC" # the learning algorithms_marl
env_name: "mpe" # Name of the environment.
env_id: "simple_spread_v3"
env_seed: 1
continuous_action: False # Continuous action space or not.
learner: "MFAC_Learner"
policy: "Categorical_MFAC_Policy"
representation: "Basic_Identical"
vectorize: "DummyVecMultiAgentEnv"
runner: "RunnerPettingzoo"
# recurrent settings for Basic_RNN representation
use_rnn: False # Whether to use recurrent neural networks.
rnn:
representation_hidden_size: [64, ] # the units for each hidden layer
gain: 0.01
actor_hidden_size: [128, ]
critic_hidden_size: [128, ]
activation: 'leaky_relu'
activation_action: 'sigmoid'
seed: 1
parallels: 128
buffer_size: 3200
n_epochs: 10
n_minibatch: 1
learning_rate: 0.01 # learning rate
weight_decay: 0
vf_coef: 0.5
ent_coef: 0.01
target_kl: 0.25 # for MAPPO_KL learner
clip_range: 0.2 # ratio clip range, for MAPPO_Clip learner
clip_type: 1 # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
gamma: 0.95 # discount factor
tau: 0.005
# tricks
use_linear_lr_decay: False # if use linear learning rate decay
end_factor_lr_decay: 0.5
use_global_state: False # if use global state to replace merged observations
use_grad_clip: True # gradient normalization
grad_clip_norm: 10.0
use_value_clip: True # limit the value range
value_clip_range: 0.2
use_value_norm: True # use running mean and std to normalize rewards.
use_huber_loss: True # True: use huber loss; False: use MSE loss.
huber_delta: 10.0
use_advnorm: True
use_gae: True
gae_lambda: 0.95
start_training: 1000 # start training after n steps
running_steps: 10000000
train_per_step: True
training_frequency: 1
test_steps: 10000
eval_interval: 100000
test_episode: 5
log_dir: "./logs/mfac/"
model_dir: "./models/mfac/"
agent: "RANDOM"
learner: "RANDOM"
env_name: "mpe" # Name of the environment.
env_id: "simple_spread_v3"
env_seed: 1
runner: "MARL" # Runner
model_dir: ""
log_dir: ""
agent: "RANDOM"
env_name: "mpe" # Name of the environment.
env_id: "simple_push_v3"
env_seed: 1
learner: "RANDOM"
runner: "RunnerCompetition"
model_dir: ""
log_dir: ""
agent: "RANDOM"
env_name: "mpe" # Name of the environment.
env_id: "simple_adversary_v3"
env_seed: 1
learner: "RANDOM"
runner: "RunnerCompetition"
model_dir: ""
log_dir: ""