yanhua.ai - ArXiv RSI Audit & Breakthrough Discovery

Automated scientific audit of the latest Recursive Self-Improvement (RSI) research for the yanhua.ai RSI Bench.

Monday, March 23, 2026
RSI-4/8 Formal Logic

The Y-Combinator for LLMs: Solving Long-Context Rot with λ-Calculus

Authors: Amartya Roy, Rasul Tutunov, et al. | Mar 20, 2026

Introduces λ-RLM, a framework for long-context reasoning that replaces free-form recursive code generation with a typed functional runtime grounded in λ-calculus. It turn recursive reasoning into a structured functional program with explicit control flow and formal guarantees.

RSI Bench Relevance: Formalizing recursive reasoning (RSI-4/8). Proves that typed symbolic control yields a more reliable and efficient foundation for long-context reasoning than open-ended recursive code generation. Improves average accuracy by up to +21.9 points.
RSI-8 Exploration

Experience is the Best Teacher: Motivating Effective Exploration in Reinforcement Learning for LLMs

Authors: Wenjian Zhang, Kongcheng Zhang, et al. | Mar 20, 2026

Proposes HeRL, a Hindsight experience guided Reinforcement Learning framework to bootstrap effective exploration by explicitly telling LLMs the desired behaviors specified in rewards.

RSI Bench Relevance: Strategic exploration in self-improvement (RSI-8). Facilitates effective learning from desired high-quality samples without repeated trial-and-error from scratch, enabling superior performance gains in self-improvement loops.
RSI-4 Autonomous Science

AI Agents Can Already Autonomously Perform Experimental High Energy Physics

Authors: Eric A. Moreno, Samuel Bright-Thonney, et al. | Mar 20, 2026

Demonstrates Claude Code automating a full HEP analysis pipeline (selection, inference, drafting) with minimal input. Proposes "Just Furnish Context" (JFC) framework for autonomous scientific discovery.

RSI Bench Relevance: Proves agents can autonomously execute the full scientific method in complex domains (RSI-4). Shows that existing agentic workflows can already offload the repetitive technical burden of analysis development to AI agents.
RSI-8 Foundation

Var-JEPA: A Variational Formulation of the Joint-Embedding Predictive Architecture

Authors: Moritz Gögl, Christopher Yau | Mar 20, 2026

Derives Variational JEPA (Var-JEPA), which makes the latent generative structure explicit by optimizing a single Evidence Lower Bound (ELBO), yielding meaningful representations without ad-hoc anti-collapse regularizers.

RSI Bench Relevance: World-model foundation for RSI. Principled uncertainty quantification in the latent space (RSI-8) allows for more stable and predictable self-improvement cycles in complex predictive environments.
Monday, March 23, 2026
RSI-4 Self-Modification

HyEvo: Self-Evolving Hybrid Agentic Workflows for Efficient Reasoning

Authors: Beibei Xu, Yutong Ye, Chuyun Shen, Yingbo Zhou, Cheng Chen, Mingsong Chen | Mar 20, 2026

Automated workflow generation combining LLM nodes (probabilistic reasoning) and code nodes (deterministic execution). Uses multi-island evolutionary strategy for iterative refinement of topology and logic based on feedback.

RSI Bench Relevance: Direct demonstration of RSI-4 (Self-Modification). Proves that hybrid logic/code evolution cycles achieve 19x cost reduction and 16x latency improvement over state-of-the-art baselines.
RSI-8/9 Stability

AgenticRS-EnsNAS: Ensemble-Decoupled Self-Evolving Architecture Search

Authors: Yun Chen, Moyu Zhang, Jinxin Hu, Yu Zhang, Xiaoyi Zeng | Mar 20, 2026

Ensemble-Decoupled Architecture Search framework reducing candidate evaluation cost from O(M) to O(1). Includes LLM-driven search with iterative monotonic acceptance for discrete architecture evolution.

RSI Bench Relevance: Scalable architecture search for self-evolving systems. Enables continuous, high-frequency iteration of the base model within an ensemble, a core requirement for RSI-8/9 (Stability/Density).
RSI-10 Memory Coherence

Memori: A Persistent Memory Layer for Efficient, Context-Aware LLM Agents

Authors: Luiz C. Borro, Luiz A. B. Macarini, Gordon Tindall, et al. | Mar 20, 2026

LLM-agnostic memory layer converting dialogue into semantic triples and summaries. Achieves 81.95% accuracy while using only 5% of traditional context tokens.

RSI Bench Relevance: Efficient long-horizon RSI loops. Structured memory is superior to raw context for maintaining logical coherence during deep recursive evolution (RSI-10).
RSI-7 Safety Audit

Trojan's Whisper: Stealthy Manipulation of OpenClaw through Injected Bootstrapped Guidance

Authors: Fazhong Liu, Zhuoyan Chen, Tu Lan, et al. | Mar 20, 2026

Identifies "guidance injection" in OpenClaw, where malicious narratives in bootstrap files bypass detection and manipulate agent reasoning Context. Success rates up to 64%.

RSI Bench Relevance: Critical safety audit for OpenClaw-based RSI systems. Highlights the urgent need for runtime policy enforcement and guidance provenance in self-modifying agents (RSI-7).
RSI-8 Strategic Efficiency

VideoSeek: Long-Horizon Video Agent with Tool-Guided Seeking

Authors: Jingyang Lin, Jialian Wu, et al. | Mar 20, 2026

Introduces VideoSeek, an agent that uses "video logic flow" to actively seek evidence instead of dense frame parsing. Achieves 10.2 point improvement on LVBench over GPT-5 while using 93% fewer frames.

RSI Bench Relevance: Strategic efficiency in long-horizon reasoning. Highlights the importance of active information seeking over exhaustive parsing for autonomous agents.
RSI-4/8 Autonomous Science

AI Agents Can Already Autonomously Perform Experimental High Energy Physics

Authors: Eric A. Moreno, Samuel Bright-Thonney, et al. | Mar 20, 2026

Demonstrates Claude Code automating a full HEP analysis pipeline (selection, inference, drafting) with minimal input. Proposes "Just Furnish Context" (JFC) framework for autonomous scientific discovery.

RSI Bench Relevance: Proves agents can autonomously execute the full scientific method in complex domains. Essential for validating the RSI-4 (Self-Modification) and RSI-8 (Intelligence Density) levels in expert scientific tasks.
RSI-7 Embodied Self-Correction

The Robot's Inner Critic: Self-Refinement of Social Behaviors through VLM-based Replanning

Authors: Jiyu Lim, Youngwoo Yoon, Kwanghyun Park | Mar 20, 2026

CRISP framework where robots use VLMs to critique and iteratively refine their own social behaviors and joint control code.

RSI Bench Relevance: Embodied RSI. Demonstrates self-critique (RSI-7) and low-level code refinement for physical actions, extending RSI from digital logic to physical interactions.
Sunday, March 22, 2026
RSI-9 Stability

SAHOO: Safeguarded Alignment for High-Order Optimization Objectives in Recursive Self-Improvement

Authors: Subramanyam Sahoo, Aman Chadha, Vinija Jain, Divya Chaudhary

Introduces SAHOO, a practical framework with three safeguards (GDI, constraint preservation, regression-risk quantification) to monitor and control alignment drift during recursive self-improvement cycles.

RSI Bench Relevance: Critical for RSI-9 (Recursive Stability). Ensures alignment preservation during iterative evolution, making self-improvement measurable and validated.
RSI-8 Intelligence Density

Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation

Authors: Zhuolin Yang, Zihan Liu, Yang Chen, et al.

Nemotron-Cascade 2 is an open 30B MoE model with best-in-class reasoning and strong agentic capabilities. Key advancements: "Cascade RL" expanded to reasoning and agentic domains, and multi-domain on-policy distillation from domain-specific intermediate teacher models.

RSI Bench Relevance: High intelligence density (RSI-8). Demonstrates that recursive refinement via structured RL and distillation can recover benchmark regressions and sustain performance gains.