Reinforcement Learning

LLM Agents

Tool-R0: LLM Agents That Learn Tool Use Without Training Data

New research introduces Tool-R0, a framework enabling LLM agents to autonomously learn tool usage through self-evolution, eliminating the need for curated training datasets while achieving state-of-the-art performance.

AI Agents

MIRA: Combining Memory Systems with RL for Smarter AI Agents

New research introduces MIRA, a framework that integrates memory architectures with reinforcement learning while minimizing expensive LLM calls, advancing efficient autonomous agent design.

Agentic AI

Proxy State Evaluation: Scaling Verifiable Rewards for AI Agents

New research proposes proxy state-based evaluation for multi-turn tool-calling LLM agents, addressing the challenge of scalable reward verification in complex agentic workflows.

LLM

Error-Localized Policy Optimization: A New Approach to LLM Tool R

New research introduces ELPO, a training method that teaches LLMs to learn from irrecoverable errors in tool-integrated reasoning chains, improving agent capabilities.

LLM Agents

Agent-Omit: Teaching LLMs to Think More Efficiently

New research introduces Agent-Omit, a reinforcement learning framework that trains LLM agents to selectively omit unnecessary reasoning steps and observations, dramatically improving computational efficiency.

LLM Infrastructure

PROTEUS: Lagrangian RL Optimizes Multi-LLM Routing for SLAs

New research introduces PROTEUS, a reinforcement learning framework using Lagrangian optimization to intelligently route requests across multiple LLMs while meeting strict service level agreements.

LLM research

Process-Supervised RL: Precise Error Penalization Boosts LLM Reas

New research introduces a method to preserve correct reasoning steps while penalizing errors, improving LLM performance through more nuanced reinforcement learning credit assignment.

voice AI

RLAIF Breakthrough: Optimizing Spoken AI Without Human Feedback

New research demonstrates how reinforcement learning from AI feedback can optimize spoken dialogue systems using multiple LLM evaluators, reducing dependency on costly human annotations.

LLM Agents

Cross-Domain RL Training: Reducing the Generalization Tax for LLM

New research explores how reinforcement learning training affects LLM agent generalization across domains, introducing the concept of 'generalization tax' and strategies to minimize performance degradation.

Reinforcement Learning

LinkedIn's Agentic RL Training Guide for Open-Source GPT Models

LinkedIn shares practical insights from training agentic reinforcement learning systems for GPT-OSS, covering infrastructure challenges, reward modeling, and lessons learned from real-world deployment.

multimodal AI

Omni-R1: Unifying Multimodal AI Reasoning with New Framework

New research introduces Omni-R1, a unified generative paradigm combining vision-language models with reinforcement learning for enhanced multimodal reasoning capabilities.

Synthetic Data

RL-Driven Synthetic Data Generation: A New Training Paradigm

New research explores how reinforcement learning can optimize synthetic data generation, with implications for training more capable AI video and media generation models.