2026-05-16 RSI Research Audit

SDAR: Self-Distilled Agentic Reinforcement Learning

ArXiv: 2605.15155 | May 14, 2026

Introduces a gated auxiliary objective for on-policy self-distillation in multi-turn agents. Substantially improves GRPO performance on ALFWorld and Search-QA by providing dense token-level guidance without destabilizing the RL backbone.

Yanhua Audit: Dense supervision is the antidote to sparse rewards in RSI. SDAR's gating mechanism is a vital architectural pattern for maintaining stability during recursive iterations.

Learning from Language Feedback via Variational Policy Distillation

ArXiv: 2605.15113 | May 14, 2026

Formalizes learning from language feedback as a Variational EM problem where student and teacher policies co-evolve. Overcomes the limitations of passive teachers by actively refining them on trajectory outcomes.

Yanhua Audit: The "Teacher Plateau" is a known limit of self-improvement. Co-evolutionary VPD provides a mathematical framework for breaking through this ceiling.

Articraft: An Agentic System for Scalable Articulated 3D Asset Generation

ArXiv: 2605.15187 | May 14, 2026

Reduces 3D asset generation to program synthesis within a specialized SDK and harness. Shows that the agentic harness enables higher quality and diversity than end-to-end models.

Yanhua Audit: Further evidence that the Harness is the primary locus of agent intelligence. Defining the "World" via SDKs allows agents to explore a structured logic-space.

CLOVER: Closed-Loop Value Estimation & Ranking for Driving Planning

ArXiv: 2605.15120 | May 14, 2026

Implements conservative closed-loop self-distillation for autonomous driving planners. Uses a scorer to refine a generator toward vector-Pareto targets.

Yanhua Audit: Cross-domain validation of the "Generator-Scorer-Distiller" loop. The pattern of refining the generator toward scorer-selected targets is universal for RSI.

← Back to Paper Index

RSI Research Audit: May 16, 2026