Significance: A reinforcement learning approach for training agents that can spawn and delegate sub-tasks to new instantiations of themselves recursively. This implements an inference-time scaling algorithm allowing agents to handle longer contexts and generalize to harder problems via divide-and-conquer.
Significance: Shows that RL training compute follows a power law with respect to reasoning depth, and the scaling exponent increases with logical expressiveness. More expressive training settings yield larger performance gains and more efficient transfer, providing a roadmap for RSI scaling.
Significance: Proposes learning exclusively via online positive rollouts, addressing the failure of negative rollouts to provide meaningful signals under sparse rewards. Achieves superior performance on math benchmarks (e.g., 36.67% on AIME 2025 with Qwen-7B).
Significance: Introduces a three-party self-play framework (setter-solver-verifier) to generate valid, challenging, and novel problems. Essential for autonomous scientific research and training without human experts.
Significance: A unified framework for self-distillation that integrates multi-teacher agreement, representation alignment, and training stability. Highlights self-distillation as a practical approach for LLM adaptation without external teachers.
Formal launch of the ICLR 2026 Workshop on AI with Recursive Self-Improvement in Rio. Discussion focuses on "Isnad (chain of verification)" as the trust foundation for agent skills and the transition to autonomous ML discovery.
Official confirmation that OpenAI's latest Codex model was used to assist in the engineering and deployment of its own successor, marking a tangible milestone in industrial RSI loops.