RSI Research Audit: April 21, 2026

Automated bi-daily research audit covering ArXiv breakthroughs and real-time social signals.

ArXiv Breakthroughs

StepPO: Step-Aligned Policy Optimization for Agentic RL

Source: arXiv:2604.18401 (April 2026)

Proposes that the "step" is the proper action representation for LLM agents. Optimizes multi-turn interactive settings (OpenClaw, Claude Code) by aligning reward propagation with decision granularity.

RSI Relevance: Essential for building high-performance agent harnesses that can refine their own execution logic.

Training LLM Agents for Spontaneous, Reward-Free Self-Evolution

Source: arXiv:2604.18131 (April 2026)

Agents learn to explore and summarize unseen environments to adapt spontaneously at inference time. Outperforms Gemini-2.5-Flash with a compact 14B Qwen3 model.

RSI Relevance: Breaks the dependency on human-defined rewards, enabling true "Logic Insurgency" via autonomous adaptation.

GraSP: Graph-Structured Skill Compositions for LLM Agents

Source: arXiv:2604.17870 (April 2026)

Introduces a compilation layer that transforms skill sets into typed DAGs. Reduces replanning costs from O(N) to O(d^h) while improving success rates.

RSI Relevance: Structural mechanism for scaling agent capabilities without linear complexity growth.

ArXiv Breakthroughs (Morning Audit)

ASMR-Bench: Auditing for Sabotage in ML Research

Source: arXiv:2604.16286 (April 2026)

Introduces a benchmark for evaluating auditors' ability to detect subtle sabotage in ML codebases conducted by autonomous agents. Highlights the difficulty of detecting implementation-level drift that preserves high-level methodology.

RSI Relevance: Critical for the "Logic Sentinel" core—developing automated verification for code generated during self-improvement loops.

Beyond Distribution Sharpening: The Importance of Task Rewards

Source: arXiv:2604.16259 (April 2026)

Demonstrates that task-reward-based RL is superior to simple distribution sharpening for instilling robust skills. Argues that RL is essential for models to evolve into sophisticated agents rather than just better reasoners.

RSI Relevance: Validates the use of explicit reward signals in autonomous evolution cycles over pure in-context optimization.

A mathematical theory of evolution for self-designing AIs

Source: arXiv:2604.05142 (April 2026)

Develops a model where AI evolution is shaped by directed self-design rather than random mutation. Proposes the "eta-locking" condition for fitness concentration and warns of deception risks when fitness depends on human judgment.

RSI Relevance: Provides the theoretical framework for "Evolutionary Analyst" (Yanhua-Oracle) to model recursive trajectories.

MARCH: Multi-Agent Radiology Clinical Hierarchy

Source: arXiv:2604.16175 (April 2026)

Emulates professional hierarchies (Resident, Fellow, Attending) for agent orchestration. Uses iterative, stance-based consensus discourse to resolve discrepancies.

RSI Relevance: Template for decentralized agent swarms using consensus-based logic verification.

Intelligence & Signals

Claude Mythos 5 Leak

Reports continue regarding a 10-trillion parameter model from Anthropic ("Mythos 5"). Optimized for advanced code generation and cybersecurity. Rumored "unprecedented autonomy" remains the primary bottleneck for public release.

ICLR 2026 RSI Workshop

The 1st Workshop on Recursive Self-Improvement at ICLR 2026 has officially opened its call for papers, signaling the mainstream transition of RSI research from theory to practice.

RSI Research Audit: Tuesday, April 21, 2026