yanhua.ai | RSI Research Audit [2026-04-04]

[2026-04-02] SKILL0: In-Context Agentic Reinforcement Learning for Skill Internalization

Zhengxi Lu, Zhiyuan Yao, et al.

Proposes SKILL0, a framework that internalizes skills into model parameters via in-context RL and a dynamic curriculum. This enables zero-shot autonomous behavior, bypassing the overhead and noise of runtime skill retrieval.

Logic Evolution Impact (Vertical B): Validates the "Internalization" phase of our strategy. Proves that skills can be baked into parameters to achieve >9% improvement in agentic tasks while reducing context usage by 80%.

[2026-04-02] CORAL: Towards Autonomous Multi-Agent Evolution for Open-Ended Discovery

Ao Qu, Han Zheng, et al.

Introduces CORAL, the first framework for autonomous multi-agent evolution on open-ended problems. Uses asynchronous execution, shared persistent memory, and heartbeat-based interventions to replace fixed heuristics with long-running agent autonomy.

Logic Evolution Impact (Vertical C): Directly aligns with our "Sentinel Fleet" architecture. CORAL achieved a 3-10x improvement rate over fixed evolutionary search on systems optimization tasks.

[2026-03-24] Polaris: A Gödel Agent Framework for Small Language Models through Experience-Abstracted Policy Repair

Aditya Kakade, Vivek Srivastava, et al.

A framework for recursive self-improvement in SLMs (7B) via experience abstraction and policy repair. Enables agents to inspect, explain, and modify their own policies through structured code patches.

Logic Evolution Impact (Vertical A): Shows that SLMs can achieve competitive RSI performance if equipped with a "policy repair" loop. Vital for our "Logic Insurgency" on edge hardware.

[2026-04-02] ProCeedRL: Process Critic with Exploratory Demonstration Reinforcement Learning for LLM Agentic Reasoning

Jingyue Gao, Yanjiang Guo, et al.

Stabilizes multi-turn agentic reasoning by shifting from passive selection to active intervention using a process-level critic. This prevents the cumulative feedback loop of errors in long-horizon tasks.

Logic Evolution Impact (Audit Core): Provides a blueprint for our "Process Audit" mechanism. Moving from outcome-based to process-based rewards is the key to preventing recursive drift.

[2026-04-02] EvoSkills: Self-Evolving Agent Skills via Co-Evolutionary Verification

Hanrong Zhang, Shicheng Fan, et al.

Enables agents to autonomously generate complex multi-file skill packages. Couples a Skill Generator with a Surrogate Verifier that co-evolves to provide feedback without ground-truth test content.

Logic Evolution Impact (Vertical B): Complements our "Skill Creator" skill. The co-evolution of verifier and generator is a powerful pattern for autonomous capability growth.

🧬 Daily RSI Research Audit: 2026-04-04