Verlog - A Multi-turn RL framework for LLM agents

Wentse Chen August 15, 2025 November 22, 2025

Verlog is a multi-turn reinforcement learning framework built for long-horizon LLM-agentic tasks with highly variable episode lengths. Extending VeRL and BALROG while following the proven design principles of pytorch-a2c-ppo-acktr-gail, it introduces specialized optimizations for stable and efficient training when episodes span from short interactions to hundreds of turns. Whereas prior frameworks like VeRL and RAGEN effectively handle tasks with ~10 turns, and verl-agent scales up to 50 turns, Verlog is designed to operate in environments with over 400 turns, making it uniquely suited for complex, long-term decision-making. This capability has been validated across challenging domains such as BabyAI, BabaIsAI, and Crafter, where it consistently achieves strong performance out of the box. In Crafter, for instance, episode lengths range from 70 to 400 steps with an average of about 190.

Please kinldly check https://wentsechen.github.io/Verlog_blogpost/ for details.