Deep Q-Network (DQN)¶
Paper Link: https://www.nature.com/articles/nature14236.
Deep Q-Network (DQN) is a foundational model-free algorithm in DRL that integrates Q-learning, a popular reinforcement learning method, with deep neural networks. It was first introduced by a team at DeepMind in 2015 and demonstrated remarkable success in playing Atari games directly from pixel inputs, achieving superhuman performance in many cases.
This table lists some general features about DQN algorithm:
Features of DQN |
Values |
Description |
|---|---|---|
On-policy |
❌ |
The evaluate policy is the same as the target policy. |
Off-policy |
✅ |
The evaluate policy is different from the target policy. |
Model-free |
✅ |
No need to prepare an environment dynamics model. |
Model-based |
❌ |
Need an environment model to train the policy. |
Discrete Action |
✅ |
Deal with discrete action space. |
Continuous Action |
❌ |
Deal with continuous action space. |
Q-Learning¶
Q-Learning is a model-free RL algorithm where the agent learns a Q-value function \(Q(s, a)\), which estimates the expected cumulative reward of taking an action \(a\) in a state \(s\) and following the optimal policy thereafter.
The Bellman equation updates the Q-value:
Here, \(\alpha\) is the learning rate, \(r\) is the reward, \(\gamma\) is the discount factor, and \(s'\) is the next state.
Deep Q-Netowrk¶
Instead of storing the Q-values in a table (tabular Q-learning), DQN uses a deep neural network to approximate the Q-function, enabling it to handle high-dimensional state spaces like images.
A buffer stores the agent’s experiences \(<s, a, r, s'>\), and mini-batches of experiences are sampled randomly to train the network. This reduces correlation between consecutive samples and stabilizes training.
A separate target network is used to compute the Q-value targets during training. The target network is updated periodically to reduce oscillations and instability.
The network is trained using the mean-squared error (MSE) loss between the predicted Q-value and the target:
where \(y = r + \gamma \max_{a'}{Q(s', a'; \theta^{-})}\), and \(\theta^{-}\) is the parameters of the target network.
DQN uses an \(\epsilon -greedy\) policy to explores random actions with probability \(\epsilon\) and exploits the learned policy otherwise:
Strengths of DQN:
Handles high-dimensional input spaces like images.
Stabilizes Q-learning through techniques like experience replay and target networks.
Demonstrated capability to outperform humans in several Atari games.
Algorithm¶
The full algorithm for training DQN is presented in Algorithm 1:
Framework¶
The overall agent-environment interaction of DQN, as implemented in XuanCe, is illustrated in the figure below.
Run DQN in XuanCe¶
Before running DQN in XuanCe, you need to prepare a conda environment and install xuance following
the installation steps.
Run Build-in Demos¶
After completing the installation, you can open a Python console and run DQN directly using the following commands:
import xuance
runner = xuance.get_runner(method='dqn',
env='classic_control', # Choices: claasi_control, box2d, atari.
env_id='CartPole-v1', # Choices: CartPole-v1, LunarLander-v2, ALE/Breakout-v5, etc.
is_test=False)
runner.run() # Or runner.benchmark()
Run With Self-defined Configs¶
If you want to run DQN with different configurations, you can build a new .yaml file, e.g., my_config.yaml.
Then, run the DQN by the following code block:
import xuance as xp
runner = xp.get_runner(method='dqn',
env='classic_control', # Choices: claasi_control, box2d, atari.
env_id='CartPole-v1', # Choices: CartPole-v1, LunarLander-v2, ALE/Breakout-v5, etc.
config_path="my_config.yaml", # The path of my_config.yaml file should be correct.
is_test=False)
runner.run() # Or runner.benchmark()
To learn more about the configurations, please visit the tutorial of configs.
Run With Custom Environment¶
If you would like to run XuanCe’s DQN in your own environment that was not included in XuanCe,
you need to define the new environment following the steps in
New Environment Tutorial.
Then, prepapre the configuration file
dqn_myenv.yaml.
After that, you can run DQN in your own environment with the following code:
import argparse
from xuance.common import get_configs
from xuance.environment import REGISTRY_ENV
from xuance.environment import make_envs
from xuance.torch.agents import DQN_Agent
configs_dict = get_configs(file_dir="dqn_myenv.yaml")
configs = argparse.Namespace(**configs_dict)
REGISTRY_ENV[configs.env_name] = MyNewEnv
envs = make_envs(configs) # Make parallel environments.
Agent = DQN_Agent(config=configs, envs=envs) # Create a DQN agent from XuanCe.
Agent.train(configs.running_steps // configs.parallels) # Train the model for numerous steps.
Agent.save_model("final_train_model.pth") # Save the model to model_dir.
Agent.finish() # Finish the training.
Citation¶
@article{mnih2015human,
title={Human-level control through deep reinforcement learning},
author={Mnih, Volodymyr and Kavukcuoglu, Koray and Silver, David and Rusu, Andrei A and Veness, Joel and Bellemare, Marc G and Graves, Alex and Riedmiller, Martin and Fidjeland, Andreas K and Ostrovski, Georg and others},
journal={nature},
volume={518},
number={7540},
pages={529--533},
year={2015},
publisher={Nature Publishing Group UK London}
}