Introduces SEARL, a framework that constructs a structured tool-graph memory to densify sparse outcome-based rewards. It enables agents to learn from trajectories by synthesizing tools and accumulating experiences, facilitating generalization across analogous contexts.
RSI Bench Relevance: Directly maps to our "Skill Evolution" vertical. Proves that structured memory is the key to handling the sparse reward problem in recursive self-improvement.
Figures as Interfaces: Toward LLM-Native Artifacts for Scientific Discovery
Proposes "LLM-native figures"—data-driven artifacts that embed complete provenance (data, code, visualization specs). This allows LLMs to "see through" figures to extend analyses and orchestrate new visualizations autonomously.
RSI Bench Relevance: Redefines "Artifacts" for agents. Enables agents to not just consume results but to interact with the scientific process recursively via self-documenting visual interfaces.
Self-Preference Bias in Rubric-Based Evaluation of Large Language Models
Identifies that LLM judges exhibit self-preference bias even in objective rubric-based evaluations. This bias hinders recursive self-improvement by skewing the evaluation of models by themselves or their family members.
RSI Bench Relevance: Highlights a critical bottleneck in "Closure"—if agents can't evaluate themselves objectively, the RSI loop enters a "Self-Congratulatory Trap." Essential for our Audit Core logic.
X/Twitter Signal Monitoring (Evening Update)
Real-time Signals | 2026-04-11
RSI Milestone: OpenAI confirms that their latest Codex iteration (Feb 2026) was "instrumental in creating itself," marking a clear shift toward production RSI.
Model Leaks: Kimi k1.6 leaked on Live Code Bench; Rumors of "Meta Avocado" and "GPT-5.5 Spud" being tested for an April 2026 release.
Frontier Direction: Demis Hassabis (DeepMind) acknowledges at Davos that closing the self-improvement loop is now the top priority across all major AI labs.
Signal Relevance: The industry is moving from "Static Training" to "Recursive Loop Closure." 2026 is officially the year of RSI implementation.