DQN with Noisy Layers (Noisy DQN)¶
Paper Link: https://arxiv.org/pdf/1706.01905.
Noisy DQN is a variant of the traditional Deep Q-Network (DQN) that introduces noise into the weights of the Q-network to improve exploration during the learning process. This is aimed at addressing one of the key challenges in reinforcement learning: balancing exploration and exploitation.
This table lists some general features about Noisy DQN algorithm:
Features of Noisy DQN |
Values |
Description |
|---|---|---|
On-policy |
❌ |
The evaluate policy is the same as the target policy. |
Off-policy |
✅ |
The evaluate policy is different from the target policy. |
Model-free |
✅ |
No need to prepare an environment dynamics model. |
Model-based |
❌ |
Need an environment model to train the policy. |
Discrete Action |
✅ |
Deal with discrete action space. |
Continuous Action |
❌ |
Deal with continuous action space. |
Key Ideas of Noisy DQN¶
Exploration vs. Exploitation: In standard DQN, exploration is often controlled by an \(\epsilon\)-greedy policy, where the agent randomly selects actions with a certain probability (epsilon), and exploits the best-known action the rest of the time. Noisy DQN attempts to address the challenge of exploration by introducing noise directly into the network’s parameters, rather than relying solely on random action selection.
Noisy Networks: Instead of using a fixed epsilon for exploration, Noisy DQN introduces noise into the parameters of the Q-network itself. This is done by adding parameter noise to the Q-network’s weights, which modifies the output Q-values, encouraging exploration of different actions and states.
Noisy Linear Layers: In the Noisy DQN architecture, the traditional fully connected layers of the neural network are replaced with “noisy” layers. These noisy layers add noise to the weights of the layers during training, making the agent’s decision-making process inherently more exploratory.
The Noisy Network Formula: For each layer in the network, the weights are parameterized as:
where:
\(\mu\) is the mean or the base weight;
\(\sigma\) is the standard deviation that controls the level of noise;
\(\epsilon\) is a sample from a noise distribution (usually Gaussian). The noise \(\epsilon\) is sampled at the beginning of each episode or iteration, ensuring the noise is dynamic during training.
The Noisy DQN has the three main benefits:
Improved Exploration: By introducing noise in the Q-values, the agent is encouraged to explore a broader range of actions, rather than exploiting the current best-known action.
Adaptive Exploration: The level of exploration can be adjusted automatically as part of the training, eliminating the need to manually tune exploration parameters like epsilon.
Efficient Training: Noisy DQN can improve sample efficiency because it uses the exploration to visit less frequently encountered states, potentially leading to better performance in complex environments.
Framework¶
Noisy DQN retains the same overall structure as DQN (i.e., experience replay, target networks, etc.), but replaces the exploration mechanism with the noisy layers in the Q-network.
Run Noisy DQN in XuanCe¶
Before running Noisy DQN in XuanCe, you need to prepare a conda environment and install xuance following
the installation steps.
Run Build-in Demos¶
After completing the installation, you can open a Python console and run Noisy DQN directly using the following commands:
import xuance
runner = xuance.get_runner(method='noisydqn',
env='classic_control', # Choices: claasi_control, box2d, atari.
env_id='CartPole-v1', # Choices: CartPole-v1, LunarLander-v2, ALE/Breakout-v5, etc.
is_test=False)
runner.run() # Or runner.benchmark()
Run With Self-defined Configs¶
If you want to run Noisy DQN with different configurations, you can build a new .yaml file, e.g., my_config.yaml.
Then, run the Noisy DQN by the following code block:
import xuance as xp
runner = xp.get_runner(method='noisydqn',
env='classic_control', # Choices: claasi_control, box2d, atari.
env_id='CartPole-v1', # Choices: CartPole-v1, LunarLander-v2, ALE/Breakout-v5, etc.
config_path="my_config.yaml", # The path of my_config.yaml file should be correct.
is_test=False)
runner.run() # Or runner.benchmark()
To learn more about the configurations, please visit the tutorial of configs.
Run With Custom Environment¶
If you would like to run XuanCe’s Noisy DQN in your own environment that was not included in XuanCe,
you need to define the new environment following the steps in
New Environment Tutorial.
Then, prepapre the configuration file
noisydqn_myenv.yaml.
After that, you can run Noisy DQN in your own environment with the following code:
import argparse
from xuance.common import get_configs
from xuance.environment import REGISTRY_ENV
from xuance.environment import make_envs
from xuance.torch.agents import NoisyDQN_Agent
configs_dict = get_configs(file_dir="noisydqn_myenv.yaml")
configs = argparse.Namespace(**configs_dict)
REGISTRY_ENV[configs.env_name] = MyNewEnv
envs = make_envs(configs) # Make parallel environments.
Agent = NoisyDQN_Agent(config=configs, envs=envs) # Create a DDPG agent from XuanCe.
Agent.train(configs.running_steps // configs.parallels) # Train the model for numerous steps.
Agent.save_model("final_train_model.pth") # Save the model to model_dir.
Agent.finish() # Finish the training.
Citations¶
@inproceedings{
plappert2018parameter,
title={Parameter Space Noise for Exploration},
author={Matthias Plappert and Rein Houthooft and Prafulla Dhariwal and Szymon Sidor and Richard Y. Chen and Xi Chen and Tamim Asfour and Pieter Abbeel and Marcin Andrychowicz},
booktitle={International Conference on Learning Representations},
year={2018},
url={https://openreview.net/forum?id=ByBAl2eAZ},
}