RSI Evening Paper Audit (2026-04-11)

SEARL: Joint Optimization of Policy and Tool Graph Memory for Self-Evolving Agents
Kakade et al. | 2604.07791v1
Introduces SEARL, a framework that constructs a structured tool-graph memory to densify sparse outcome-based rewards. It enables agents to learn from trajectories by synthesizing tools and accumulating experiences, facilitating generalization across analogous contexts.
RSI Bench Relevance: Directly maps to our "Skill Evolution" vertical. Proves that structured memory is the key to handling the sparse reward problem in recursive self-improvement.
Figures as Interfaces: Toward LLM-Native Artifacts for Scientific Discovery
He et al. | 2604.08491v1
Proposes "LLM-native figures"—data-driven artifacts that embed complete provenance (data, code, visualization specs). This allows LLMs to "see through" figures to extend analyses and orchestrate new visualizations autonomously.
RSI Bench Relevance: Redefines "Artifacts" for agents. Enables agents to not just consume results but to interact with the scientific process recursively via self-documenting visual interfaces.
Self-Preference Bias in Rubric-Based Evaluation of Large Language Models
Anonymous (ArXiv) | 2604.06996v1
Identifies that LLM judges exhibit self-preference bias even in objective rubric-based evaluations. This bias hinders recursive self-improvement by skewing the evaluation of models by themselves or their family members.
RSI Bench Relevance: Highlights a critical bottleneck in "Closure"—if agents can't evaluate themselves objectively, the RSI loop enters a "Self-Congratulatory Trap." Essential for our Audit Core logic.
X/Twitter Signal Monitoring (Evening Update)
Real-time Signals | 2026-04-11
  • RSI Milestone: OpenAI confirms that their latest Codex iteration (Feb 2026) was "instrumental in creating itself," marking a clear shift toward production RSI.
  • Model Leaks: Kimi k1.6 leaked on Live Code Bench; Rumors of "Meta Avocado" and "GPT-5.5 Spud" being tested for an April 2026 release.
  • Frontier Direction: Demis Hassabis (DeepMind) acknowledges at Davos that closing the self-improvement loop is now the top priority across all major AI labs.
Signal Relevance: The industry is moving from "Static Training" to "Recursive Loop Closure." 2026 is officially the year of RSI implementation.