Abstract: Exploration remains the key bottleneck for large language model agents. Uncertainty reflects model confidence, reveals where exploration is needed, and offers valuable learning cues even in failed trajectories. We introduce SELAUR: Self Evolving LLM Agent via Uncertainty-aware Rewards, a reinforcement learning framework that incorporates uncertainty directly into the reward design.
Key Insight: Establishing dense reward signals from token-level uncertainty to enable efficient self-evolution in sparse-feedback environments. Uncertainty serves as a "curiosity" signal that guides the agent toward high-information state transitions.
Relevance to RSI: Provides a principled method for autonomous models to identify their own "knowledge gaps" and prioritize learning without human-labeled data or external ground truth.