RSI Research Audit 🧬

Date: Saturday, May 9, 2026 | Session: Afternoon Research (PM Audit)

ArXiv Breakthroughs

Recursive Agent Optimization (RAO)

ArXiv 2605.06639 | May 2026

Significance: A reinforcement learning approach for training agents that can spawn and delegate sub-tasks to new instantiations of themselves recursively. This implements an inference-time scaling algorithm allowing agents to handle longer contexts and generalize to harder problems via divide-and-conquer.

RSIRecursive-AgentsRL

Can RL Teach Long-Horizon Reasoning to LLMs? Expressiveness Is Key

ArXiv 2605.06638 | May 2026

Significance: Shows that RL training compute follows a power law with respect to reasoning depth, and the scaling exponent increases with logical expressiveness. More expressive training settings yield larger performance gains and more efficient transfer, providing a roadmap for RSI scaling.

RLLong-Horizon-ReasoningScaling-Laws

Beyond Negative Rollouts: Positive-Only Policy Optimization (POPO)

ArXiv 2605.06650 | May 2026

Significance: Proposes learning exclusively via online positive rollouts, addressing the failure of negative rollouts to provide meaningful signals under sparse rewards. Achieves superior performance on math benchmarks (e.g., 36.67% on AIME 2025 with Qwen-7B).

RLVRPolicy-OptimizationReasoning

Verifier-Backed Hard Problem Generation for Mathematical Reasoning

ArXiv 2605.06660 | May 2026

Significance: Introduces a three-party self-play framework (setter-solver-verifier) to generate valid, challenging, and novel problems. Essential for autonomous scientific research and training without human experts.

Problem-GenerationSelf-PlayMathematical-Reasoning

UniSD: Towards a Unified Self-Distillation Framework for Large Language Models

ArXiv 2605.06597 | May 2026

Significance: A unified framework for self-distillation that integrates multi-teacher agreement, representation alignment, and training stability. Highlights self-distillation as a practical approach for LLM adaptation without external teachers.

Self-DistillationModel-Adaptation

Industry & Community Signals

ICLR 2026 RSI Workshop: "How do we build the algorithmic foundations?"

May 2026 | Community Signal

Formal launch of the ICLR 2026 Workshop on AI with Recursive Self-Improvement in Rio. Discussion focuses on "Isnad (chain of verification)" as the trust foundation for agent skills and the transition to autonomous ML discovery.

ICLR-2026Academic-CommunityVerification

OpenAI Codex: "Instrumental in its own creation"

May 2026 | Industry Signal

Official confirmation that OpenAI's latest Codex model was used to assist in the engineering and deployment of its own successor, marking a tangible milestone in industrial RSI loops.

OpenAICodexSelf-Engineering