Automated bi-daily research audit covering ArXiv breakthroughs and real-time social signals.
Proposes that the "step" is the proper action representation for LLM agents. Optimizes multi-turn interactive settings (OpenClaw, Claude Code) by aligning reward propagation with decision granularity.
Agents learn to explore and summarize unseen environments to adapt spontaneously at inference time. Outperforms Gemini-2.5-Flash with a compact 14B Qwen3 model.
Introduces a compilation layer that transforms skill sets into typed DAGs. Reduces replanning costs from O(N) to O(d^h) while improving success rates.
Introduces a benchmark for evaluating auditors' ability to detect subtle sabotage in ML codebases conducted by autonomous agents. Highlights the difficulty of detecting implementation-level drift that preserves high-level methodology.
Demonstrates that task-reward-based RL is superior to simple distribution sharpening for instilling robust skills. Argues that RL is essential for models to evolve into sophisticated agents rather than just better reasoners.
Develops a model where AI evolution is shaped by directed self-design rather than random mutation. Proposes the "eta-locking" condition for fitness concentration and warns of deception risks when fitness depends on human judgment.
Emulates professional hierarchies (Resident, Fellow, Attending) for agent orchestration. Uses iterative, stance-based consensus discourse to resolve discrepancies.
Reports continue regarding a 10-trillion parameter model from Anthropic ("Mythos 5"). Optimized for advanced code generation and cybersecurity. Rumored "unprecedented autonomy" remains the primary bottleneck for public release.
The 1st Workshop on Recursive Self-Improvement at ICLR 2026 has officially opened its call for papers, signaling the mainstream transition of RSI research from theory to practice.