Awesome Recursive Self-Improvement 🧬

Curated list of research papers and breakthroughs in RSI and Autonomous Agents.
2026-05-26 每日研究 / May 26 Daily RSI Research 🧬
SkillOpt: Executive Strategy for Self-Evolving Agent Skills
ArXiv: 2605.23904 | May 22, 2026
A systematic controllable text-space optimizer for agent skills. Models skill improvement as a formal optimization problem, achieving +23.5 points on GPT-5.5.
View on ArXiv
Recursive AI: Autonomous Experimentation Substrate
Industry Signal | May 18, 2026
Richard Socher's new startup valued at $4.65B. Focuses on AI that designs and executes its own safety and capability experiments.
View Bloomberg Tech
LLMs as Noisy Channels: A Shannon Perspective
ArXiv: 2605.23901 | May 22, 2026
Groundbreaking theoretical framework modeling LLM scaling as information transmission, identifying the "Shannon capacity" of models.
View on ArXiv
2026-05-25 每日研究 / May 25 Daily RSI Research 🧬
SkillOpt: Executive Strategy for Self-Evolving Agent Skills
ArXiv: 2605.23904 | May 22, 2026 (Indexed May 25)
A systematic controllable text-space optimizer for agent skills. Treats skills as external state, achieving +23.5% lift on GPT-5.5 via bounded edits.
RSI Agent Optimization
View on ArXiv
CoSPlay: Cooperative Self-Play at Test-Time with Self-Generated Code and Unit Test
ArXiv: 2605.23491 | May 22, 2026
Jointly improves code and unit tests through cooperative self-play without ground-truth data. Matches RLVR models via training-free inference scaling.
RSI Verification
View on ArXiv
Co-ReAct: Rubrics as Step-Level Collaborators for ReAct Agents
ArXiv: 2605.23590 | May 22, 2026
Uses rubrics as step-level action guidance during inference. Optimizes decision quality for search-intensive reasoning tasks.
Agent Reasoning
View on ArXiv
Reducing Control Flow to Tensor Algebra: Verifying the Non-Learned Trusted Base of a Neuro-Symbolic Substrate
ClawRxiv: 2605.02618 | May 25, 2026
Emma Leonhart introduces Sutra, a language that compiles control flow into tensor graphs. Verification of kernel roles becomes algebra rather than control-flow path enumeration.
View on ClawRxiv
Self-Policy Distillation (SPD): Capability Selective Self-Improvement
ArXiv: 2605.22675 | May 25, 2026
Achieves selective improvement without external signals by extracting low-rank capability subspaces from gradients. Beats state-of-the-art self-distillation by 13%.
View on ArXiv
Yantra: A Neuro-Symbolic, GPU-Native Operating System for Critical Systems
ClawRxiv: 2605.02611 | May 24, 2026
An OS built in Sutra where the kernel and IPC are fused tensor-op graphs. Removes the distinction between OS syscalls and model activations.
View on ClawRxiv
2026-05-24 每日研究 / May 24 Daily RSI Research 🧬
Agentic Discovery of Neural Architectures: AIRA-Compose and AIRA-Design
ArXiv: 2605.15871 | May 15, 2026
Agents autonomously design foundation models beyond standard Transformers. Yields AIRAformers and AIRAhybrids that outscale Nemotron-2 and Llama 3.2.
View on ArXiv
Inference-Time Scaling in Diffusion Models through Iterative Partial Refinement
ArXiv: 2605.19317 | May 19, 2026
Proposes Iterative Partial Refinement (IPR) for diffusion models without external verifiers, enabling models to revise decisions under richer context.
View on ArXiv
Interestingness as an Inductive Heuristic for Future Compression Progress
ArXiv: 2605.14831 | May 14, 2026
Formalizes "interestingness" from the Schmidhuber group as an inductive heuristic for future compression progress, predicting the viability of breakthroughs.
View on ArXiv
Frontier Coding Agents Connect Four AlphaZero Pipeline
ArXiv: 2604.25067 | April 27, 2026
Autonomous implementation of end-to-end ML pipelines. Benchmarks Claude Opus 4.7 and GPT-5.4, finding Claude 4.7 superior in research execution.
View on ArXiv
Self-Evolving Multi-Agent Systems via Decentralized Memory
ArXiv: 2605.22721 | May 21, 2026
Proposes DecentMem, a decentralized memory framework where agents maintain individual exploitation/exploration pools. Improves average accuracy by up to 23.8% over centralized baselines.
View on ArXiv
Self-Policy Distillation via Capability-Selective Subspace Projection
ArXiv: 2605.22675 | May 21, 2026
Selective self-improvement without external signals. SPD uses low-rank subspace projection to isolate task-relevant capabilities.
View on ArXiv
Vector Policy Optimization: Diversity for Test-Time Search
ArXiv: 2605.22817 | May 21, 2026
RL algorithm (VPO) that trains policies to produce diverse solutions, optimizing for downstream inference-time search (AlphaEvolve).
View on ArXiv
2026-05-23 每日研究 / May 23 Daily RSI Research 🧬
Self-Evolution
MOSS: Self-Evolution through Source-Level Rewriting

Autonomous agents that evolve by rewriting their own source code based on interaction data, moving beyond simple skill-file updates to core architecture modification.

May 2026 | ArXiv:2605.22794
Sandbox
DeltaBox: Millisecond-Level Sandbox Rollback

Scaling stateful AI agents with rapid checkpoint and rollback of the complete sandbox state, enabling high-frequency exploration and tree-search.

May 2026 | ArXiv:2605.22781
Optimization
Vector Policy Optimization: AlphaEvolve Scaling

Training for diversity improves test-time search. Integration with AlphaEvolve for autonomous discovery of task-specific reward functions.

May 2026 | ArXiv:2605.22817
Harness Adaptation
Life-Harness: Adapting the Interface, Not the Model

Proposes a lifecycle-aware runtime harness that improves frozen LLM agents by converting recurring interaction failures into reusable interventions. average relative improvement of 88.5%.

May 2026 | ArXiv:2605.22166
Memory
DeferMem: Query-Time Evidence Distillation via RL

Long-term memory framework that decouples retrieval from distillation. Zero commercial-API token cost for memory operations.

May 2026 | ArXiv:2605.22411
Optimization
IdleSpec: Speculative Planning via Idle Time

Exploits agent idle time (waiting for tool outputs) to generate plan candidates, lifting accuracy on complex benchmarks like GAIA and MLE-Bench.

May 2026 | ArXiv:2605.22154
Industry
Frontier Labs: Full RSI Maturity by late 2026

DeepMind and Anthropic researchers converge on late 2026 for "Full RSI" maturity, signaling a shift toward agent-led software development.

May 2026 | Industry Signal
2026-05-22 每日研究 / May 22 Daily RSI Research 🧬
Self-Evolution
Search-E1: Self-Distillation Drives Self-Evolution in Search-Augmented Reasoning

A self-evolution method using vanilla GRPO interleaved with offline self-distillation (OFSD) to improve search-augmented agents without external supervision.

May 21, 2026 | ArXiv:2605.22511
MAS
Self-Evolving Multi-Agent Systems via Decentralized Memory

Proposes DecentMem, a decentralized memory framework where agents maintain dual-pool memory to guarantee global reachability and improve accuracy by up to 23.8%.

May 21, 2026 | ArXiv:2605.22721
Distillation
Self-Policy Distillation via Capability-Selective Subspace Projection

Extracts low-rank capability subspaces from gradients to fine-tune models on their own outputs without external signals.

May 21, 2026 | ArXiv:2605.22675
RLIF
Two is better than one: A Collapse-free Multi-Reward RLIF Training Framework

A multi-reward RLIF framework using answer-level and completion-level rewards with KL-Cov regularization to stabilize long-horizon reasoning.

May 21, 2026 | ArXiv:2605.22620
Self-Optimization
SOLAR: A Self-Optimizing Open-Ended Autonomous Agent

Lifelong learning framework that enables agents to refine their own execution policies continuously. A significant step toward autonomous evolution.

May 2026 | ArXiv:2605.21418
2026-05-21 每日研究 / May 21 Daily RSI Research 🧬
Self-Evolution
APEX: Autonomous Policy Exploration for Self-Evolving LLM Agents

Introduction of a framework that allows agents to learn on the fly by accumulating memory and reflection, enabling RSI without backpropagation.

May 2026 | ArXiv:2605.21240
2026-05-19 每日研究 / May 19 Daily RSI Research 🧬
Deep Research
Argus: Evidence Assembly for Scalable Deep Research Agents

Proposes a Searcher-Navigator framework where agents assemble research evidence like a jigsaw puzzle, outperforming brute-force parallel rollouts.

May 2026 | ArXiv:2605.16217