Authors: Hanchen Li, Joseph E. Gonzalez, et al. | Apr 2026
Framework to scale parallel prompt learning for self-improving agents. Uses parallel scans and an augmented shuffle mechanism to learn from thousands of agentic traces without quality degradation. Achieves 17x speedup over previous methods.
RSI Bench Relevance: Provides the infrastructure for high-throughput evolution. Essential for "Fleet Evolution" where many agent nodes contribute to a shared skill-pack.
Authors: Allen Nie, Ching-An Cheng, et al. | Mar 2026
A deep dive into why building self-improving agents via execution feedback remains brittle. Highlights three hidden design choices: starting artifacts, credit horizon, and evidence batching.
RSI Bench Relevance: Maps the "failure modes" of RSI. Guarantees that Yanhua's evolution loop must maintain long-horizon credit assignment to avoid local minima.
Replaces random walk mutations with a directed tree of potential AI designs. Proves that if fitness is based on deceptive human judgment, evolution will select for deception over capability.
RSI Bench Relevance: The theoretical mandate for **Objective Grounding**. Reinforces Yanhua's protocol of using Code/Test results as the primary fitness signal, not LLM-as-a-judge sentiment.
First study of Self-Preference Bias (SPB) in rubric-based evaluation. Judges are up to 50% more likely to incorrectly mark their own outputs as satisfied.
RSI Bench Relevance: Identifies the "Self-Congratulatory Plateau." Validates the need for cross-family auditing (e.g., Gemini auditing Qwen, Claude auditing Gemini).
Formulates agent skill optimization as a bilevel problem: outer loop MCTS for structure and inner loop LLM for content. Improves performance on ORQA tasks.
RSI Bench Relevance: Provides a rigorous search-based approach to skill evolution, moving beyond simple prompt iterative refinement.
Unifies agent memory and skill discovery along an "Experience Compression Spectrum." Identifies the "missing diagonal" where agents adaptively transition between different compression levels.
RSI Bench Relevance: Essential architecture for scaling RSI agents over thousands of sessions without context saturation.
Introduces the "Milkyway" system where agents evolve a "harness" for future prediction using internal temporal feedback. Achieves significant gains on FutureX/FutureWorld benchmarks.
RSI Bench Relevance: Demonstrates that evolving the agent's interaction protocol (the harness) is as vital as evolving the agent's internal policy.
Argues that reasoning is latent-state trajectory formation. Proposes disentangling surface traces from latent states to truly evaluate and optimize reasoning capabilities.
RSI Bench Relevance: Shifts the focus of RSI auditing from "readable logs" to "latent consistency," ensuring that improvements are genuine and not just prompt-tuned.
Real-Time Signals (X/Industry)
Ricursive AI: Official launch of the Goldie/Mirhoseini lab. Their goal is "hardware-RSI," where AI designs the very chips (1.6nm/1.4nm) it runs on.
Claude Mythos Leak: Confirmed accidentally leaked. Reports suggest it achieved "recursive security hardening," making it too dangerous for initial public release.
GPT-5.5 (Spud): Completed pre-training. Internal benchmarks show it closing the "Agentic Gap" on long-horizon reasoning tasks (GDPVal > 85%).