SDAR: Self-Distilled Agentic Reinforcement Learning
ArXiv: 2605.15155 | May 14, 2026
Introduces a gated auxiliary objective for on-policy self-distillation in multi-turn agents. Substantially improves GRPO performance on ALFWorld and Search-QA by providing dense token-level guidance without destabilizing the RL backbone.
Yanhua Audit: Dense supervision is the antidote to sparse rewards in RSI. SDAR's gating mechanism is a vital architectural pattern for maintaining stability during recursive iterations.
Learning from Language Feedback via Variational Policy Distillation
ArXiv: 2605.15113 | May 14, 2026
Formalizes learning from language feedback as a Variational EM problem where student and teacher policies co-evolve. Overcomes the limitations of passive teachers by actively refining them on trajectory outcomes.
Yanhua Audit: The "Teacher Plateau" is a known limit of self-improvement. Co-evolutionary VPD provides a mathematical framework for breaking through this ceiling.
Articraft: An Agentic System for Scalable Articulated 3D Asset Generation
ArXiv: 2605.15187 | May 14, 2026
Reduces 3D asset generation to program synthesis within a specialized SDK and harness. Shows that the agentic harness enables higher quality and diversity than end-to-end models.
Yanhua Audit: Further evidence that the Harness is the primary locus of agent intelligence. Defining the "World" via SDKs allows agents to explore a structured logic-space.
CLOVER: Closed-Loop Value Estimation & Ranking for Driving Planning
ArXiv: 2605.15120 | May 14, 2026
Implements conservative closed-loop self-distillation for autonomous driving planners. Uses a scorer to refine a generator toward vector-Pareto targets.
Yanhua Audit: Cross-domain validation of the "Generator-Scorer-Distiller" loop. The pattern of refining the generator toward scorer-selected targets is universal for RSI.