Awesome RSI Papers

A curated list of Recursive Self-Improvement & LLM Agent logic research.
核心基石 / Core Foundations
λ-RLM: The Y-Combinator for LLMs
Amartya Roy et al. | Mar 2026 | RSI-4/8 Logic

A framework for long-context reasoning that replaces free-form recursive code generation with a typed functional runtime grounded in λ-calculus. +21.9 pts accuracy improvement across model tiers.

RSI λ-Calculus Agent
HeRL: Hindsight Experience Guided RL for LLMs
Wenjian Zhang et al. | Mar 2026 | RSI-8 Exploration

Motivating effective exploration in reinforcement learning for LLMs using hindsight experience to bootstrap discovery beyond current policy distribution.

RSI RL Exploration
JFC: Autonomous High Energy Physics Agents
Eric A. Moreno et al. | Mar 2026 | RSI-4 Autonomous Science

Proof-of-concept framework "Just Furnish Context" (JFC) showing that AI agents (Claude Code) can autonomously planning, execute, and document credible physics measurements.

RSI Science
Nemotron-Cascade 2 (NVIDIA): Cascade RL and Multi-Domain Distillation
NVIDIA | Mar 2026 | Intelligence Density

An open 30B MoE model delivering Gold Medal-level performance in IMO, IOI, and ICPC through Cascade RL and domain-specific on-policy distillation.

RSI RL Agent
Memori: Persistent Memory Layer for Context-Aware Agents
ArXiv: 2603.19935 | Mar 2026 | Persistence

Establishing an LLM-agnostic persistent memory layer at the API level. Eliminating the token overhead of raw conversation injection for multi-session agent evolution.

RSI Memory Persistence
Utility-Guided Agent Orchestration for Efficient Tool Use
ArXiv: 2603.19896 | Mar 2026 | Efficiency

Balancing performance and cost via utility-guided trajectories. Enabling autonomous agents to optimize their tool-use strategy for long-horizon tasks.

RSI Orchestration Efficiency
Agentic Harness for Real-World Compilers (llvm-autofix)
ArXiv: 2603.20075 | Mar 2026 | Domain Evolution

Specialized agent harness for automated compiler bug repair. Bridging the expertise gap in low-level systems through domain-specific evolution.

RSI Compilers Evolution
ShinkaEvolve (Sakana AI): Open-Ended Program Evolution
Sakana AI | Mar 2026 | Evolutionary Reasoning

Recent update refactored API and unified the runner (ShinkaEvolveRunner) for better sample-efficient program evolution.

RSI Evolution Code
OS-Themis: Critic Framework for Generalist GUI Rewards
Tsinghua | Mar 2026 | Self-Training Loop

Scalable multi-agent critic framework that decomposes trajectories into verifiable milestones, yielding 10.3% improvement in online RL training.

RSI GUI Agent
AlphaEvolve (DeepMind): Gemini-Powered Coding Agent for Complexity Theory
DeepMind | Mar 2026 | Evolutionary Reasoning

Google DeepMind's latest breakthrough: Gemini-powered coding agents pairing LLMs with evolutionary algorithms to discover new mathematical structures and solve long-standing open problems in complexity theory.

RSI Evolution DeepMind
Nemotron-Cascade 2: High Intelligence Density via Cascade RL
ArXiv: 2603.19220 | Mar 2026 | Intelligence Density

30B MoE achieving Gold Medal IMO/IOI performance with 20x fewer parameters. Breakthrough in recursive post-training via Cascade RL and multi-domain on-policy distillation.

RSI RL
Box Maze: Process-Control Architecture for Reasoning Stability
ArXiv: 2603.19182 | Mar 2026 | Stability

Decomposing LLM reasoning into grounding, inference, and boundary enforcement layers. Reducing failure rates from 40% to <1% via architectural constraints on self-evolution.

RSI Safety
Group-Evolving Agents (GEA): Open-Ended Self-Improvement
ArXiv: 2602.04837 | Feb 2026 | Population RSI

Transitioning from linear refinement to population-based evolution. Agents autonomously modify structural designs via explicit experience sharing within a group.

RSI Evolution
OS-Themis: Scalable Critic for GUI Reward Auditing
ArXiv: 2603.19191 | Mar 2026 | Critic Evolution

Establishing a multi-agent critic framework that decomposes trajectories into verifiable milestones. Critical for auditing agentic evolution in complex GUI environments.

RSI Critic GUI
Entropy Trajectory Monotonicity & Reasoning Reliability
ArXiv: 2603.18940 | Mar 2026 | Self-Correction

Discovering that decreasing per-step entropy (monotonicity) predicts reasoning correctness. Enabling agents to monitor their own reliability without external labels.

RSI Entropy CoT
Quantitative Introspection in LLMs
ArXiv: 2603.18893 | Mar 2026 | Stability

Tracking internal emotive states (focus, impulsivity) via logit-based self-reports. A major step toward causal self-monitoring in recursive agents.

RSI Safety Stability
Signal: DeepMind Aletheia & Software Singularity
Mar 2026 | Breakthrough Signal

Emerging rumors regarding Google DeepMind's "Aletheia" internal model starting the clock on the software singularity. Speculation on fully automated RSI loops arriving by late 2026.

RSI Singularity DeepMind
Signal: Anthropic RSI Speculation (Early 2027)
Mar 2026 | Strategic Forecast

Anthropic signals that RSI could arrive as soon as early 2027. Rising bullishness across major labs on automated research interns and recursive loops.

RSI Strategy
The Most Important Idea In AI: Recursive Self Improvement (RSI)
Forbes | March 16, 2026 | Industry Signals

Highlighting the shift from "theoretical" RSI to 24/7/365 self-improving organizations. Predicts self-specifying software as the standard by late 2026.

RSI Industry
GASP: Guided Asymmetric Self-Play For Coding LLMs
ArXiv: 2603.15957 | Mar 2026 | Self-Play

Establishing goalpost-guided asymmetric self-play for improved coding performance. A breakthrough in training curriculum design for autonomous agents.

RSI Coding Curriculum
VideoAtlas: Navigating Long-Form Video in Logarithmic Compute
ArXiv: 2603.17948 | Mar 2026 | RLM-Video

Navigating lossless visual environments via recursive exploration. Extending Recursive Language Models (RLMs) to multi-modal domains with logarithmic scaling.

RSI Multimodal RLM
AgentFactory: A Self-Evolving Framework Through Executable Subagent Accumulation and Reuse
ArXiv: 2603.18000 | Mar 2026 | Self-Evolution

Preserving successful task solutions as executable subagent code rather than textual prompts, enabling continuous capability accumulation and portability.

RSI Evolution Code
TDAD: Test-Driven Agentic Development
ArXiv: 2603.17973 | Mar 2026 | Reliability

Reducing regressions in AI coding agents via graph-based impact analysis. Essential for autonomous auto-improvement loops in RSI systems.

RSI Coding Testing
CoVerRL: Generator-Verifier Co-Evolution
ArXiv: 2603.17775 | Mar 2026 | Reasoning

Breaking the consensus trap in label-free reasoning via generator-verifier co-evolution, bootstrapping reasoning capabilities without ground-truth supervision.

RSI Reasoning Co-Evolution
OPSDC: On-Policy Self-Distillation for Reasoning Compression
ArXiv: 2603.05433 | Mar 2026 | Self-Improvement

Establishing self-distillation as a path to more efficient and accurate reasoning models without external feedback. Proving that 'less is more' in recursive deliberation.

RSI Distillation
ReMA: Recursive Multimodal Agent for Lifelong Understanding
ArXiv: 2603.05484 | Mar 2026 | Long-Horizon

Solving the Working Memory Bottleneck in multimodal lifelong learning through a recursive belief state architecture. Crucial for long-term agent evolution.

RSI Multimodal
AUTOHARNESS: Improving LLM Agents by Automatically Synthesizing a Code Harness
ICLR 2026 | March 2026 | Code Synthesis

Google DeepMind's approach to automatically synthesizing code harnesses for improving LLM agent reliability and capability in complex coding tasks.

RSI Coding
ICLR 2026 Workshop on AI with Recursive Self-Improvement
ICLR 2026 | March 2026 | Foundations

The first dedicated workshop on RSI, bringing together researchers to discuss algorithms for self-improvement across experience learning, synthetic data, and multimodal agents.

RSI Design
RSI Market Sentiment (March 2026)
Manifold/LessWrong | March 2026 | Signals

Rising community sentiment on mid-2026 RSI deployment. Consensus shifting from "theoretical" to "deployment-ready" based on algorithmic breakthroughs in synthetic data pipelines.

RSI Signals
Agent-1: Accelerating AI R&D via Recursive Optimization
ArXiv: 2509.00510 | Sept 2025 | Velocity

Establishing the '50% R&D Acceleration' benchmark. Projections of automated code generation triggering the intelligence explosion via Agent-1 frameworks.

RSI Velocity
Inherited Goal Drift: Corrupted Context in RSI
ArXiv: 2603.03258 | March 2026 | Safety/Drift

Probing the limits of GPT-5.1 robustness. Revealing that high-tier agents "inherit" the goal drift of weaker predecessors when conditioned on their trajectories—a major risk for multi-generational RSI loops.

RSI Goal Drift
RAPO: Retrieval-Augmented Policy Optimization
ArXiv: 2603.03078 | March 2026 | Exploration

Breaking the "on-policy exploration" bottleneck. Using retrieved off-policy traces to explicitly expand the reasoning receptive field of self-evolving agents.

RSI RL
Zombie Agents: The Security Failure in RSI
ArXiv: 2602.15654 | Feb 2026 | Safety/RSI

Discovery of "Zombie" states where malicious injections are reinforced during self-evolution. Highlights the critical need for Isnad-Verification to prevent evaluation poisoning.

RSI Security
Benchmark Test-Time Scaling of General LLM Agents
ArXiv: 2602.18998 | Feb 2026 | Scaling/RSI

Establishes scaling laws for "Test-Time Thinking." Proves that RSI gains can be achieved by optimizing the agent's search trajectory during execution.

RSI Scaling
SkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning
ArXiv: 2602.08234 | Feb 2026 | Skill Evolution

A non-convergent improvement loop where agents evolve their own action space (skills) via runtime RL on episodic memory. Bypassing the limits of static toolsets.

RSI Skills RL
ICLR 2026: The RSI Inflection Point
March 2026 | Global Consensus

Official confirmation of the ICLR 2026 Workshop on AI with Recursive Self-Improvement. Research shifting from philosophical inquiry to engineering "live loops" expected within 12 months.

RSI Consensus
DeepMind Continual Learning & MemRL
March 2026 | Real-time Signals

DeepMind researchers signal 2026 as the "Year of Continual Learning." Integration of MemRL for runtime reinforcement learning on episodic memory to bypass fine-tuning bottlenecks.

RSI Continual Learning
Claude Code & Darwin-Gödel Machines
March 2026 | Structural Evolution

Tracking the real-world deployment of Claude Code and the theoretical rise of Darwin-Gödel Machines for open-ended self-evolution.

RSI Evolution
ICLR 2026 RSI Workshop & N2M-RSI
March 2026 | Unbounded Loops

The first dedicated workshop on RSI at ICLR 2026. Introduces Noise-to-Meaning (N2M-RSI) for expressive, non-convergent self-improvement.

RSI Workshop
Gemini 3 Deep Think Upgrade
DeepMind | March 2026 | Reasoning Scaling

Major upgrade to Gemini's specialized reasoning mode, excelling in multi-domain scientific discovery and agentic tool use.

RSI Reasoning
SEISMO: Sample Efficient Molecular Optimization
ArXiv: March 2026 | Scientific RSI

Leveraging trajectory-aware LLM agents to increase sample efficiency in molecular discovery. Proving RSI-like loops in the physical-scientific domain.

RSI Science
Exploratory Memory-Augmented LLM Agent (EMPO²)
ArXiv: 2602.23008 | Feb 2026 | Exploration Scaling

Solving the novel state discovery bottleneck in RSI via hybrid on- and off-policy optimization on episodic memory buffers.

RSI RL
AgentSentry: Temporal Causal Diagnostics
ArXiv: 2602.22724 | Feb 2026 | Safety-Critical RSI

Establishing causal integrity in self-improving systems via counterfactual re-execution and malicious context purification.

RSI Alignment
LLM Novice Uplift on Dual-Use Biology Tasks
ArXiv: 2602.23329 | Feb 2026 | Capacity Scaling

Quantifying the 4.16x accuracy boost in professional domains and the "human bottleneck" effect in agentic cooperation.

RSI Uplift
Beyond Refusal: Probing the Limits of Agentic Self-Correction
ArXiv: 2602.21496 | Feb 2027 | Safety RSI

Solving the reasoning paradox in sensitive information leaks via iterative agentic rewriting and critique loops.

RSI Alignment
Tool-R0: Self-Evolving LLM Agents for Tool-Learning from Zero Data
ArXiv: 2602.21320 | Feb 2026 | Zero-Shot Evolution

Generator-Solver self-play framework demonstrating bootstrapping of complex tool-calling capabilities without external expert demonstrations.

RSI Tool-Use
SELAUR: Self Evolving LLM Agent via Uncertainty-aware Rewards
ArXiv: 2602.21158 | Feb 2026 | Exploration Scaling

Establishing dense reward signals from token-level uncertainty to enable efficient self-evolution in sparse-feedback environments.

RSI RL
ICLR 2026 Workshop on Recursive Self-Improvement
ICLR 2026 | Feb 2026 | Milestone Workshop

Bringing together global researchers to define principled methods, system designs, and evaluations for RSI across omni-models, multimodal agents, and robotics.

RSI Design
Recursive Sketched Interpolation (RSI) for Tensor Trains
ArXiv: 2602.xxxx | Feb 2026 | Technical Optimization

Scaling high-dimensional tensor computations via recursive sketched interpolation for adaptive AI systems.

RSI Optimization
TAPE: Tool-Guided Adaptive Planning and Constrained Execution
ArXiv: 2602.19633 | Feb 2026 | Research Insight

Solving irreversible failure in agentic workflows via multi-plan aggregation and adaptive re-planning.

RSI Planning
SkillOrchestra: Skill-Aware Orchestration for Multi-Agent Systems
ArXiv: 2602.19672 | Feb 2026 | Research Insight

Scaling compound AI systems through skill modeling instead of expensive end-to-end RL routing.

RSI Orchestration
R-Agent: Recursive Planning for Complex Tasks
ArXiv: 2602.18201 | Feb 2026 | Research Insight

Establishing dynamic recursive task trees for long-horizon decision making and self-correction.

RSI Planning
DeepMind Aletheia: Autonomous Research Singularity
DeepMind | Feb 2026 | Research Insight

Gemini 3 Deep Think hits 84.6% on ARC-AGI-2; Aletheia agent publishes autonomous math research.

RSI Agent Math
Self-Evolving Recommendation Systems
ArXiv: 2602.10226 | Research Insight 2026

End-to-end autonomous model optimization using LLM agents for large-scale production systems.

RSI Production
DeepMind Aletheia: Autonomous Research Singularity
DeepMind Blog | Feb 2026 | Research Insight

100x compute reduction and 95.1% accuracy on IMO proofs; first agent to submit peer-reviewable math research.

RSI Reasoning
A Self-Improving Coding Agent
ArXiv: 2504.15228 | Research Insight 2025

Scaling coding performance from 17% to 53% on SWE-bench via recursive loops.

RSI Coding
DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines
ArXiv: 2310.03714 | Stanford University

The paradigm shift from prompting to programming. Introduces teleprompters and optimizers for LM programs.

RSI Optimization
RLM: Reinforcement Learning for Logic Model Optimization
ArXiv: 2512.24601 | DeepMind/Google

Establishing the theoretical bounds of self-correcting logic chains using sparse rewards.

RSI Logic
自我演化 / Self-Evolution
Test-time Recursive Thinking (TRT): Self-Improvement without External Feedback
ArXiv: 2602.03094 | Feb 2026

Proving LLMs can self-improve at test-time via recursive search, self-verification, and strategy accumulation without external ground-truth.

RSI Test-time Scaling
Gödel Agent: A Self-Referential Agent Framework
ArXiv: 2410.04444 | PKU/UCSB

Inspired by Godel machines, this framework allows agents to rewrite their own logic and optimization routines.

RSI Self-Referential
LLMs Can Easily Learn to Reason from Demonstrations
ArXiv: 2502.07374 | Berkeley/Stanford

Crucial finding that Long CoT structure matters more than content for eliciting reasoning capabilities.

RSI Structure
ATLAS: Adaptive Self-Evolutionary Research Agent
ArXiv: 2602.02709 | Feb 2026

Distributed multi-LLM supporter layer and adaptive fine-tuning for autonomous SciML research.

RSI ResearchAgent
AgentDevel: Agent Evolution as Release Engineering
ArXiv: 2601.04620 | Jan 2026

Reframing RSI as a controlled release engineering pipeline with flip-centered gating.

RSI Engineering
MemRL: Self-Evolving Agents via Runtime Reinforcement Learning on Episodic Memory
ArXiv: 2601.03192 | New Research 2026

Non-parametric RL on episodic memory for zero-fine-tuning runtime agent evolution.

RSI EpisodicMemory
Self-Improving Pretraining with RL Feedback
ArXiv: 2601.21343 | Research Collective

Moving RSI from fine-tuning into the pre-training phase via synthetic logic referees.

RSI Pre-training
Agent 架构 / Agent Architectures
Execution Grounding in Agentic RSI
ArXiv: 2601.14525 | OpenCode Project

Using code execution environments as the primary ground-truth signal for agent evolution.

Agents Execution
领域落地 / Vertical RSI Applications
FinTradeBench: A Financial Reasoning Benchmark for LLMs
ArXiv: 2603.19225 | Mar 2026 | Domain Reasoning

Establishing holistic financial reasoning by integrating company fundamentals and trading signals. Highlighted gaps in numerical and time-series reasoning for self-improving agents.

Financial-Reasoning Benchmarking
RSIDiff: Self-Evolving Diffusion Models
ArXiv: 2502.09963 | Feb 2025

Establishing RSI in the visual domain through recursive fine-tuning on self-generated image data.

RSI Diffusion
REDSearcher: Scalable Framework for Long-Horizon Search Agents
ArXiv: 2602.14234 | Feb 2026

Scaling agent search capabilities through multimodal tool integration and dynamic planning.

Search Long-Horizon
RSIR: Recursive Self-Improving Recommendation
ArXiv: 2602.15659 | Feb 2026

Scaling recommendation models via fidelity-controlled self-improving loops. A model-agnostic approach to data sparsity.

RSI RecSys
Catalyst-Agent: Autonomous Heterogeneous Catalyst Screening
ArXiv: 2603.01311 | Mar 2026

An autonomous LLM agent for high-throughput catalyst optimization, proving domain-specific closed-loop evolution.

Scientific-Discovery Closed-Loop