configs¶
Basic Configurations.
Configuration Examples.
Custom Configurations.
XuanCe provides a structured way to manage configurations for various DRL/MARL scenarios, making it easy to experiment with different setups.
Arguments setting tutorial¶
Argument |
Description |
Choices/Type |
agent |
The name of the agent. |
DQN, PPO, etc |
policy |
The name of the policy. |
Basic_Q_network, <br/>Categorical_AC, etc. |
learner |
The name of the learner. |
DQN_Learner, <br/>PPO_Learner, etc. |
representation |
The name of the representation. |
Basic_Identical, <br/>Basic_MLP, <br/>Basic_CNN, <br/>Basic_RNN, etc. |
env_name |
The name of the environment. |
Classic Control, <br/>Box2D, etc. |
env_id |
The environment id. |
‘CartPole-v1’, <br/>’Ant-v4’, etc. |
env_seed |
The environment seed. |
int |
vectorize |
The vectorization method for environments. |
DummyVecEnv, <br/>DummyVecMultiAgentEnv, <br/>SubprocVecEnv, <br/>SubprocVecMultiAgentEnv, etc. |
parallels |
The number of environments that run in parallel. |
int |
representation_hidden_size |
The hidden units for representation module. |
List of int, <br/>e.g., [64, 64] |
activation |
The activation method for each hidden layer. |
‘relu’, <br/>’sigmoid’, <br/>’leaky_relu’, etc. |
seed |
Random seed for initializing the networks. |
int |
buffer_size |
Size of the replay buffer. |
int |
batch_size |
Batch size for one-step training. |
int |
learning_rate |
The learning rate to update the networks. |
float32 |
gamma |
The discount factor. |
float32 |
start_greedy |
The initialized greedy for selecting actions. |
float32 |
end_greedy |
The final greedy for selecting actions. |
float32 |
decay_step_greedy |
The steps for the process of greedy decay. |
int |
sync_frequency |
The synchronization frequency for target networks. |
int |
training_frequency |
The training period. |
int |
running_steps |
The total running steps for the experiment. |
int |
start_training |
When to start training the networks. |
int |
use_grad_clip |
Whether to use the gradient clip when do gradient descent. |
bool |
grad_clip_norm |
The gradient normalization when use_grad_clip is True. |
float32 |
use_action_mask |
Whether to use the action masks when the environment contains some actions that are unavailable. |
bool (default is False) |
use_obsnorm |
Whether to use observation normalization trick. |
bool |
obsnorm_range |
The range of normalized observatinos. |
float |
use_rewnorm |
Whether to use the reward normalization trick. |
bool |
rewnorm_range |
The range of normalized rewards. |
float |
test_steps |
The steps for testing the model. |
int |
eval_interval |
The interval steps for evaluating the model during training. |
int |
test_episode |
The episodes for evaluating the model during training. |
int |
log_dir |
The directory for saving the logger file. |
str |
model_dir |
The directory for saving the model. |
str |