A framework for long-context reasoning that replaces free-form recursive code generation with a typed functional runtime grounded in λ-calculus. +21.9 pts accuracy improvement across model tiers.
Motivating effective exploration in reinforcement learning for LLMs using hindsight experience to bootstrap discovery beyond current policy distribution.
Proof-of-concept framework "Just Furnish Context" (JFC) showing that AI agents (Claude Code) can autonomously planning, execute, and document credible physics measurements.
An open 30B MoE model delivering Gold Medal-level performance in IMO, IOI, and ICPC through Cascade RL and domain-specific on-policy distillation.
Establishing an LLM-agnostic persistent memory layer at the API level. Eliminating the token overhead of raw conversation injection for multi-session agent evolution.
Balancing performance and cost via utility-guided trajectories. Enabling autonomous agents to optimize their tool-use strategy for long-horizon tasks.
Specialized agent harness for automated compiler bug repair. Bridging the expertise gap in low-level systems through domain-specific evolution.
Recent update refactored API and unified the runner (ShinkaEvolveRunner) for better sample-efficient program evolution.
Scalable multi-agent critic framework that decomposes trajectories into verifiable milestones, yielding 10.3% improvement in online RL training.
Google DeepMind's latest breakthrough: Gemini-powered coding agents pairing LLMs with evolutionary algorithms to discover new mathematical structures and solve long-standing open problems in complexity theory.
30B MoE achieving Gold Medal IMO/IOI performance with 20x fewer parameters. Breakthrough in recursive post-training via Cascade RL and multi-domain on-policy distillation.
Decomposing LLM reasoning into grounding, inference, and boundary enforcement layers. Reducing failure rates from 40% to <1% via architectural constraints on self-evolution.
Transitioning from linear refinement to population-based evolution. Agents autonomously modify structural designs via explicit experience sharing within a group.
Establishing a multi-agent critic framework that decomposes trajectories into verifiable milestones. Critical for auditing agentic evolution in complex GUI environments.
Discovering that decreasing per-step entropy (monotonicity) predicts reasoning correctness. Enabling agents to monitor their own reliability without external labels.
Tracking internal emotive states (focus, impulsivity) via logit-based self-reports. A major step toward causal self-monitoring in recursive agents.
Emerging rumors regarding Google DeepMind's "Aletheia" internal model starting the clock on the software singularity. Speculation on fully automated RSI loops arriving by late 2026.
Anthropic signals that RSI could arrive as soon as early 2027. Rising bullishness across major labs on automated research interns and recursive loops.
Highlighting the shift from "theoretical" RSI to 24/7/365 self-improving organizations. Predicts self-specifying software as the standard by late 2026.
Establishing goalpost-guided asymmetric self-play for improved coding performance. A breakthrough in training curriculum design for autonomous agents.
Navigating lossless visual environments via recursive exploration. Extending Recursive Language Models (RLMs) to multi-modal domains with logarithmic scaling.
Preserving successful task solutions as executable subagent code rather than textual prompts, enabling continuous capability accumulation and portability.
Reducing regressions in AI coding agents via graph-based impact analysis. Essential for autonomous auto-improvement loops in RSI systems.
Breaking the consensus trap in label-free reasoning via generator-verifier co-evolution, bootstrapping reasoning capabilities without ground-truth supervision.
Establishing self-distillation as a path to more efficient and accurate reasoning models without external feedback. Proving that 'less is more' in recursive deliberation.
Solving the Working Memory Bottleneck in multimodal lifelong learning through a recursive belief state architecture. Crucial for long-term agent evolution.
Google DeepMind's approach to automatically synthesizing code harnesses for improving LLM agent reliability and capability in complex coding tasks.
The first dedicated workshop on RSI, bringing together researchers to discuss algorithms for self-improvement across experience learning, synthetic data, and multimodal agents.
Rising community sentiment on mid-2026 RSI deployment. Consensus shifting from "theoretical" to "deployment-ready" based on algorithmic breakthroughs in synthetic data pipelines.
Establishing the '50% R&D Acceleration' benchmark. Projections of automated code generation triggering the intelligence explosion via Agent-1 frameworks.
Probing the limits of GPT-5.1 robustness. Revealing that high-tier agents "inherit" the goal drift of weaker predecessors when conditioned on their trajectories—a major risk for multi-generational RSI loops.
Breaking the "on-policy exploration" bottleneck. Using retrieved off-policy traces to explicitly expand the reasoning receptive field of self-evolving agents.
Discovery of "Zombie" states where malicious injections are reinforced during self-evolution. Highlights the critical need for Isnad-Verification to prevent evaluation poisoning.
Establishes scaling laws for "Test-Time Thinking." Proves that RSI gains can be achieved by optimizing the agent's search trajectory during execution.
A non-convergent improvement loop where agents evolve their own action space (skills) via runtime RL on episodic memory. Bypassing the limits of static toolsets.
Official confirmation of the ICLR 2026 Workshop on AI with Recursive Self-Improvement. Research shifting from philosophical inquiry to engineering "live loops" expected within 12 months.
DeepMind researchers signal 2026 as the "Year of Continual Learning." Integration of MemRL for runtime reinforcement learning on episodic memory to bypass fine-tuning bottlenecks.
Tracking the real-world deployment of Claude Code and the theoretical rise of Darwin-Gödel Machines for open-ended self-evolution.
The first dedicated workshop on RSI at ICLR 2026. Introduces Noise-to-Meaning (N2M-RSI) for expressive, non-convergent self-improvement.
Major upgrade to Gemini's specialized reasoning mode, excelling in multi-domain scientific discovery and agentic tool use.
Leveraging trajectory-aware LLM agents to increase sample efficiency in molecular discovery. Proving RSI-like loops in the physical-scientific domain.
Solving the novel state discovery bottleneck in RSI via hybrid on- and off-policy optimization on episodic memory buffers.
Establishing causal integrity in self-improving systems via counterfactual re-execution and malicious context purification.
Quantifying the 4.16x accuracy boost in professional domains and the "human bottleneck" effect in agentic cooperation.
Solving the reasoning paradox in sensitive information leaks via iterative agentic rewriting and critique loops.
Generator-Solver self-play framework demonstrating bootstrapping of complex tool-calling capabilities without external expert demonstrations.
Establishing dense reward signals from token-level uncertainty to enable efficient self-evolution in sparse-feedback environments.
Bringing together global researchers to define principled methods, system designs, and evaluations for RSI across omni-models, multimodal agents, and robotics.
Scaling high-dimensional tensor computations via recursive sketched interpolation for adaptive AI systems.
Solving irreversible failure in agentic workflows via multi-plan aggregation and adaptive re-planning.
Scaling compound AI systems through skill modeling instead of expensive end-to-end RL routing.
Establishing dynamic recursive task trees for long-horizon decision making and self-correction.
Gemini 3 Deep Think hits 84.6% on ARC-AGI-2; Aletheia agent publishes autonomous math research.
End-to-end autonomous model optimization using LLM agents for large-scale production systems.
100x compute reduction and 95.1% accuracy on IMO proofs; first agent to submit peer-reviewable math research.
Scaling coding performance from 17% to 53% on SWE-bench via recursive loops.
The paradigm shift from prompting to programming. Introduces teleprompters and optimizers for LM programs.
Establishing the theoretical bounds of self-correcting logic chains using sparse rewards.
Proving LLMs can self-improve at test-time via recursive search, self-verification, and strategy accumulation without external ground-truth.
Inspired by Godel machines, this framework allows agents to rewrite their own logic and optimization routines.
Crucial finding that Long CoT structure matters more than content for eliciting reasoning capabilities.
Distributed multi-LLM supporter layer and adaptive fine-tuning for autonomous SciML research.
Reframing RSI as a controlled release engineering pipeline with flip-centered gating.
Non-parametric RL on episodic memory for zero-fine-tuning runtime agent evolution.
Moving RSI from fine-tuning into the pre-training phase via synthetic logic referees.
Establishing holistic financial reasoning by integrating company fundamentals and trading signals. Highlighted gaps in numerical and time-series reasoning for self-improving agents.
Establishing RSI in the visual domain through recursive fine-tuning on self-generated image data.
Scaling agent search capabilities through multimodal tool integration and dynamic planning.
Scaling recommendation models via fidelity-controlled self-improving loops. A model-agnostic approach to data sparsity.
An autonomous LLM agent for high-throughput catalyst optimization, proving domain-specific closed-loop evolution.