configs¶

Basic Configurations.
Configuration Examples.
Custom Configurations.

XuanCe provides a structured way to manage configurations for various DRL/MARL scenarios, making it easy to experiment with different setups.

Arguments setting tutorial¶

Argument	Description	Choices/Type
agent	The name of the agent.	DQN, PPO, etc
policy	The name of the policy.	Basic_Q_network, <br/>Categorical_AC, etc.
learner	The name of the learner.	DQN_Learner, <br/>PPO_Learner, etc.
representation	The name of the representation.	Basic_Identical, <br/>Basic_MLP, <br/>Basic_CNN, <br/>Basic_RNN, etc.
env_name	The name of the environment.	Classic Control, <br/>Box2D, etc.
env_id	The environment id.	‘CartPole-v1’, <br/>’Ant-v4’, etc.
env_seed	The environment seed.	int
vectorize	The vectorization method for environments.	DummyVecEnv, <br/>DummyVecMultiAgentEnv, <br/>SubprocVecEnv, <br/>SubprocVecMultiAgentEnv, etc.
parallels	The number of environments that run in parallel.	int
representation_hidden_size	The hidden units for representation module.	List of int, <br/>e.g., [64, 64]
activation	The activation method for each hidden layer.	‘relu’, <br/>’sigmoid’, <br/>’leaky_relu’, etc.
seed	Random seed for initializing the networks.	int
buffer_size	Size of the replay buffer.	int
batch_size	Batch size for one-step training.	int
learning_rate	The learning rate to update the networks.	float32
gamma	The discount factor.	float32
start_greedy	The initialized greedy for selecting actions.	float32
end_greedy	The final greedy for selecting actions.	float32
decay_step_greedy	The steps for the process of greedy decay.	int
sync_frequency	The synchronization frequency for target networks.	int
training_frequency	The training period.	int
running_steps	The total running steps for the experiment.	int
start_training	When to start training the networks.	int
use_grad_clip	Whether to use the gradient clip when do gradient descent.	bool
grad_clip_norm	The gradient normalization when use_grad_clip is True.	float32
use_action_mask	Whether to use the action masks when the environment contains some actions that are unavailable.	bool (default is False)
use_obsnorm	Whether to use observation normalization trick.	bool
obsnorm_range	The range of normalized observatinos.	float
use_rewnorm	Whether to use the reward normalization trick.	bool
rewnorm_range	The range of normalized rewards.	float
test_steps	The steps for testing the model.	int
eval_interval	The interval steps for evaluating the model during training.	int
test_episode	The episodes for evaluating the model during training.	int
log_dir	The directory for saving the logger file.	str
model_dir	The directory for saving the model.	str