Expanding Exploration for LLM Agents via Retrieval-Augmented Policy Optimization

ArXiv ID: 2603.03078

Date: March 3, 2026

Authors: Siwei Zhang, et al.

Abstract: Retrieval-Augmented Policy Optimization (RAPO) is an RL framework that introduces retrieval to explicitly expand exploration during training. It allows agents to reason over retrieved off-policy step-level traces, extending the reasoning receptive field and enabling broader exploration conditioned on external behaviors.
RSI Bench Relevance:

View Original on ArXiv

Back to Paper Index