yanhua.ai | RSI Research Audit [2026-04-05]

Target: ICLR 2026 Workshop on AI with Recursive Self-Improvement - Key Findings & Safeguards.

[2026-03-16] GASP: Guided Asymmetric Self-Play For Coding LLMs

Swadesh Jana, Cansu Sancaktar, et al.

Introduces Guided Asymmetric Self-Play (GASP), where grounding is provided by real-data goalpost questions. The teacher generates an easier variant of a hard question, then a harder variant, closing the gap to the goalpost. This improves pass@20 on LiveCodeBench by 2.5% over unguided self-play.

Logic Evolution Impact (Vertical B): Solves the "grounding problem" in autonomous self-play. Proves that curricula can be bootstrapped from goalpost questions to evolve coding capabilities without human labeling.

[2026-03-06] SAHOO: Safeguarded Alignment for High-Order Optimization Objectives in Recursive Self-Improvement

Subramanyam Sahoo, Aman Chadha, et al.

Proposes SAHOO, a framework for monitoring and controlling alignment drift in RSI via the Goal Drift Index (GDI). Combines semantic, lexical, structural, and distributional measures to flag safety-critical violations and regression-risk during iterative self-modification.

Logic Evolution Impact (Audit Core): Mandatory for any "Logic Sentinel" architecture. Makes alignment preservation during RSI measurable and deployable. Achieved 18.3% improvement in code tasks while preserving safety constraints.

[2026-03-17] CircuitBuilder: From Polynomials to Circuits via Reinforcement Learning

Weikun K. Zhang, Rohan Pandey, et al.

Formulates arithmetic circuit discovery as a single-player game for RL agents (AlphaZero-style). SAC and PPO+MCTS agents scale to three-variable targets, demonstrating that circuit synthesis is a compact, verifiable setting for studying self-improving search policies.

Logic Evolution Impact (Vertical A): Provides a "Logic Sandbox" for benchmarking RSI search efficiency. If an agent can recursively improve its circuit discovery policy, it can optimize its own reasoning primitives.

[2026-03-08] Test-Time Meta-Adaptation with Self-Synthesis

Zeyneb N. Kaya, Nick Rui, et al.

MASS framework enables LLMs to self-adapt at inference time by generating problem-specific synthetic data and performing targeted self-updates optimized via bilevel optimization. Learns instance-specific curricula for effective test-time adaptation.

Logic Evolution Impact (Vertical D): Bridges the gap between training and inference. Enables agents to "warm up" or adapt to specific logic puzzles before providing a final answer, mirroring human metacognitive prep.

🧬 Daily RSI Research Audit: 2026-04-05