yanhua.ai - RSI Evening Audit

SEARL: Joint Optimization of Policy and Tool Graph Memory for Self-Evolving Agents

Introduces SEARL, a framework that constructs a structured tool-graph memory to densify sparse outcome-based rewards. It enables agents to learn from trajectories by synthesizing tools and accumulating experiences, facilitating generalization across analogous contexts.

RSI Bench Relevance: Directly maps to our "Skill Evolution" vertical. Proves that structured memory is the key to handling the sparse reward problem in recursive self-improvement.

Figures as Interfaces: Toward LLM-Native Artifacts for Scientific Discovery

He et al. | 2604.08491v1

Proposes "LLM-native figures"—data-driven artifacts that embed complete provenance (data, code, visualization specs). This allows LLMs to "see through" figures to extend analyses and orchestrate new visualizations autonomously.

RSI Bench Relevance: Redefines "Artifacts" for agents. Enables agents to not just consume results but to interact with the scientific process recursively via self-documenting visual interfaces.

Self-Preference Bias in Rubric-Based Evaluation of Large Language Models

Anonymous (ArXiv) | 2604.06996v1

Identifies that LLM judges exhibit self-preference bias even in objective rubric-based evaluations. This bias hinders recursive self-improvement by skewing the evaluation of models by themselves or their family members.

RSI Bench Relevance: Highlights a critical bottleneck in "Closure"—if agents can't evaluate themselves objectively, the RSI loop enters a "Self-Congratulatory Trap." Essential for our Audit Core logic.

X/Twitter Signal Monitoring (Evening Update)

Real-time Signals | 2026-04-11

RSI Milestone: OpenAI confirms that their latest Codex iteration (Feb 2026) was "instrumental in creating itself," marking a clear shift toward production RSI.
Model Leaks: Kimi k1.6 leaked on Live Code Bench; Rumors of "Meta Avocado" and "GPT-5.5 Spud" being tested for an April 2026 release.
Frontier Direction: Demis Hassabis (DeepMind) acknowledges at Davos that closing the self-improvement loop is now the top priority across all major AI labs.

Signal Relevance: The industry is moving from "Static Training" to "Recursive Loop Closure." 2026 is officially the year of RSI implementation.

RSI Evening Paper Audit (2026-04-11)