강화학습

Proximal Policy Optimization Algorithms 논문 리뷰

2025.07.05

링크:https://arxiv.org/abs/1707.06347 Proximal Policy Optimization AlgorithmsWe propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent. Whereas standararxiv.org Background몰랐던 용어 surrogate objective: 특정 constraint 안에서 최..

논문 리뷰/강화학습

Deep reinforcement learning from human preferences 논문 리뷰

2025.07.04

링크:https://arxiv.org/abs/1706.03741BackgroundRL많은 RL task를 해결하기 위해서는 well-specified reward function이 필요한데, 이를 찾는 것은 complex, poorly-defined, or hard함시스템에 맞는 간단한 reward function을 설계할 수 있으나 이는 사용자의 의도를 완전히 충족시키지 모함이전 연구는 전문가의 피드백이 필요하거나, 비교보다는 순위를 매김Inverse Reinforcement learning, Imitation Learning과 같은 연구가 있으나 인간이 입증하기 어려운 행동에는 직접적으로 적용할 수 없음-> human feedback을 반영하여 reward function을 학습시키자 MethodsH..

Proximal Policy Optimization Algorithms 논문 리뷰

Deep reinforcement learning from human preferences 논문 리뷰

티스토리툴바