DQN with Prioritized Experience Replay (PerDQN)¶
Paper Link: https://arxiv.org/pdf/1511.05952
DQN with Prioritized Experience Replay (PER DQN) is a variant of the traditional DQN that incorporates Prioritized Experience Replay to improve the agent’s learning efficiency by prioritizing certain experiences during training.
This table lists some general features about PER DQN algorithm:
Features of PER DQN |
Values |
Description |
|---|---|---|
On-policy |
❌ |
The evaluate policy is the same as the target policy. |
Off-policy |
✅ |
The evaluate policy is different from the target policy. |
Model-free |
✅ |
No need to prepare an environment dynamics model. |
Model-based |
❌ |
Need an environment model to train the policy. |
Discrete Action |
✅ |
Deal with discrete action space. |
Continuous Action |
❌ |
Deal with continuous action space. |
Method¶
In standard DQN, experiences are stored in a replay buffer, and the agent samples uniformly from this buffer to train its Q-network. However, this uniform sampling can be inefficient, especially when certain experiences are more important for learning than others. PER DQN addresses this by prioritizing experiences that are expected to provide more useful information for improving the agent’s policy.
Prioritized Experience Replay¶
In Prioritized Experience Replay (PER), instead of sampling uniformly from the buffer, experiences are prioritized based on their temporal-difference (TD) error.
The TD error is the difference between the expected Q-value (from the Bellman equation) and the current Q-value predicted by the agent’s Q-network.
High TD error means that the experience has high learning potential because it indicates that the agent’s current Q-function is not accurately predicting the future reward for that experience.
How PER DQN Works¶
In PER DQN, the replay buffer is augmented with priority sampling. The priority of an experience is proportional to its TD error.
When the agent samples experiences for training, those with higher TD errors are more likely to be selected.
This focuses the agent’s learning on experiences that are more surprising or difficult, accelerating the learning process by revisiting important experiences more frequently.
Importance Sampling¶
To avoid biasing the training process due to preferential sampling of experiences, importance sampling is used. Each experience is assigned a weight that compensates for the non-uniform sampling. This ensures that the agent learns correctly even when the experiences are not uniformly sampled.
Mathematical Details¶
The priority \(p_i\) of experience \(i\) is calculated using the TD error \(\delta_i\), typically in the form:
where \(\delta_i\) is the absolute value of the TD error, \(\epsilon\) is a small constant added to ensure that experiences with zero TD error are still included in the buffer.
The probability of sampling experience \(i\) is given by:
where \(\alpha\) controls how much prioritization is used (i.e., how much the TD error affects the sampling probability).
Algorithm¶
The full algorithm for training PER DQN is presented in Algorithm 1:
Run PER DQN in XuanCe¶
Before running PER DQN in XuanCe, you need to prepare a conda environment and install xuance following
the installation steps.
Run Build-in Demos¶
After completing the installation, you can open a Python console and run PER DQN directly using the following commands:
import xuance
runner = xuance.get_runner(method='perdqn',
env='classic_control', # Choices: claasi_control, box2d, atari.
env_id='CartPole-v1', # Choices: CartPole-v1, LunarLander-v2, ALE/Breakout-v5, etc.
is_test=False)
runner.run() # Or runner.benchmark()
Run With Self-defined Configs¶
If you want to run PER DQN with different configurations, you can build a new .yaml file, e.g., my_config.yaml.
Then, run the PER DQN by the following code block:
import xuance as xp
runner = xp.get_runner(method='perdqn',
env='classic_control', # Choices: claasi_control, box2d, atari.
env_id='CartPole-v1', # Choices: CartPole-v1, LunarLander-v2, ALE/Breakout-v5, etc.
config_path="my_config.yaml", # The path of my_config.yaml file should be correct.
is_test=False)
runner.run() # Or runner.benchmark()
To learn more about the configurations, please visit the tutorial of configs.
Run With Custom Environment¶
If you would like to run XuanCe’s PER DQN in your own environment that was not included in XuanCe,
you need to define the new environment following the steps in
New Environment Tutorial.
Then, prepapre the configuration file
perdqn_myenv.yaml.
After that, you can run PER DQN in your own environment with the following code:
import argparse
from xuance.common import get_configs
from xuance.environment import REGISTRY_ENV
from xuance.environment import make_envs
from xuance.torch.agents import PerDQN_Agent
configs_dict = get_configs(file_dir="perdqn_myenv.yaml")
configs = argparse.Namespace(**configs_dict)
REGISTRY_ENV[configs.env_name] = MyNewEnv
envs = make_envs(configs) # Make parallel environments.
Agent = PerDQN_Agent(config=configs, envs=envs) # Create a DDPG agent from XuanCe.
Agent.train(configs.running_steps // configs.parallels) # Train the model for numerous steps.
Agent.save_model("final_train_model.pth") # Save the model to model_dir.
Agent.finish() # Finish the training.
Citations¶
@inproceedings{DBLP:journals/corr/SchaulQAS15,
author = {Tom Schaul and
John Quan and
Ioannis Antonoglou and
David Silver},
editor = {Yoshua Bengio and
Yann LeCun},
title = {Prioritized Experience Replay},
booktitle = {4th International Conference on Learning Representations, {ICLR} 2016,
San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings},
year = {2016},
url = {http://arxiv.org/abs/1511.05952},
timestamp = {Thu, 25 Jul 2019 14:25:38 +0200},
biburl = {https://dblp.org/rec/journals/corr/SchaulQAS15.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}