Expanding Exploration for LLM Agents via Retrieval-Augmented Policy Optimization
Abstract: Retrieval-Augmented Policy Optimization (RAPO) is an RL framework that introduces retrieval to explicitly expand exploration during training. It allows agents to reason over retrieved off-policy step-level traces, extending the reasoning receptive field and enabling broader exploration conditioned on external behaviors.
RSI Bench Relevance:
- Exploration Strategy: Solves the "Closed-Loop Trap" (where agents only learn from their own limited outputs) by injecting external reasoning perspectives.
- Efficiency: Achieves 1.2x faster training efficiency, a key metric for recursive systems.
- Stability: Uses retrieval-aware policy optimization to stabilize training during multi-step reasoning evolution.
View Original on ArXiv
Back to Paper Index