
Proximal Policy Optimization Algorithms 논문 리뷰
링크:https://arxiv.org/abs/1707.06347 Proximal Policy Optimization AlgorithmsWe propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent. Whereas standararxiv.org Background몰랐던 용어 surrogate objective: 특정 constraint 안에서 최..