Introduces a framework that enables Large Reasoning Models (LRMs) to self-evolve by dynamically augmenting the training stream from unlabeled test queries. It uses Online Variational Synthesis to force the model to learn underlying logic rather than superficial patterns.
RSI Bench Relevance: Validates Vertical A (Logic Insurgency) by showing that self-synthesized variations can replace high-cost labeled data for policy improvement.
Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering
Argues that agent progress depends on "external cognitive infrastructure." Proposes "self-evolving harnesses" where the runtime around the model adapts to transform hard cognitive burdens into simpler, solvable forms.
RSI Bench Relevance: Directly supports the yanhua.ai vision of agents that build their own tools and harnesses (Vertical B: Skill Synthesis).
Polaris: A Gödel Agent Framework for Small Language Models through Experience-Abstracted Policy Repair
Implements a Gödel agent loop for 7B models. The agent inspects its policy, explains errors, and generates auditable code patches that repair the policy. Uses "experience abstraction" to transfer strategies to unseen instances.
RSI Bench Relevance: A practical realization of the "Recursion Singularity" for SLMs. Proves that small, auditable patches are the key to persistent self-improvement.
SkillClaw: Let Skills Evolve Collectively with Agentic Evolver
Presents a framework for collective skill evolution in multi-user agent ecosystems like OpenClaw. It treats cross-user interactions as the primary signal for refining existing skills and synthesizing new ones, propagating improvements system-wide.
RSI Bench Relevance: Critical for OpenClaw. Validates the "Collective Intelligence" aspect of RSI where agents learn from a distributed population of users and peers.
ZeroCoder: Can LLMs Improve Code Generation Without Ground-Truth Supervision?
A fully label-free co-evolutionary framework (ZeroCoder) that jointly trains a Coder and a Tester using execution feedback. It uses a Bayesian selector (DyB4) to handle selector drift, achieving significant gains without oracle supervision.
RSI Bench Relevance: Demonstrates "Closure" in the RSI loop for coding—improving without any human-in-the-loop for ground truth.
Reason in Chains, Learn in Trees: Self-Rectification for Agent Policy Optimization (T-STAR)
Introduces T-STAR, a framework that consolidates multi-step trajectories into a "Cognitive Tree" to identify critical steps. It enables back-propagation of rewards and "In-Context Thought Grafting" for surgical policy optimization.
RSI Bench Relevance: Breakthrough in credit assignment for long-horizon agent tasks. Essential for stable recursive improvement in complex environments.
Proposes TrACE, a training-free controller that allocates LLM calls adaptively by measuring consistency across small rollout samples. High agreement signals "easy" steps, while low agreement triggers more compute.
RSI Bench Relevance: Practical "Efficiency" layer for RSI. Proves that model self-consistency is a reliable signal for dynamic compute allocation.
3DrawAgent: Teaching LLM to Draw in 3D with Early Contrastive Experience
Leverages "relative experience optimization" (adapted from GRPO) to iteratively refine 3D drawing knowledge without parameter updates. Uses pairwise comparisons based on perceptual rewards to self-improve spatial understanding.
RSI Bench Relevance: Extends RSI into the "Spatial/Visual Reasoning" domain. Shows that "Relative Ranking" is a powerful signal for self-evolution in complex modal tasks.
X/Twitter Signal Monitoring
Real-time Signals | 2026-04-10
Anthropic RSI Signal: Researchers (Tess Hegarty) identifying Recursive Self-Improvement as a primary metric for approaching breakthroughs.
OpenReview Leaks: Reports of data leakage due to website bugs (Yuandong Tian) leading to early exposure of frontier model architectures and RSI benchmarks.
DeepMind Aletheia: Continued chatter about Aletheia proving "Reasoning Singularity" as an engineering reality in mathematical discovery.
Signal Relevance: Market sentiment and leak patterns confirm that major labs (Anthropic, DeepMind) are converging on RSI as the "Next Frontier" (2026-2027).