Proximal Policy Optimization with KL Divergence (PPO-KL)