business
Proximal Policy Optimization (PPO)
TermProximal Policy Optimization is a reinforcement-learning algorithm introduced by OpenAI that updates policies through clipped surrogate objectives for improved stability. It is widely used as a baseline method for training agents, including large-language-model agents discussed in recent research on agentic RL.
Stories
Completed digest stories linked to this service.
-
Stabilizing Agentic RL and Closing Multilingual Alignment Gaps2026-03-06New research points to a more stable RL path for long-horizon LLM agents and exposes multilingual alignment ga...