SkillOpt: Executive Strategy for Self-Evolving Agent Skills
ArXiv: 2605.23904 | May 22, 2026
A systematic controllable text-space optimizer for agent skills. Models skill improvement as a formal optimization problem, achieving +23.5 points on GPT-5.5.
View on ArXiv
Recursive AI: Autonomous Experimentation Substrate
Industry Signal | May 18, 2026
Richard Socher's new startup valued at $4.65B. Focuses on AI that designs and executes its own safety and capability experiments.
View Bloomberg Tech
LLMs as Noisy Channels: A Shannon Perspective
ArXiv: 2605.23901 | May 22, 2026
Groundbreaking theoretical framework modeling LLM scaling as information transmission, identifying the "Shannon capacity" of models.
View on ArXiv
SkillOpt: Executive Strategy for Self-Evolving Agent Skills
ArXiv: 2605.23904 | May 22, 2026 (Indexed May 25)
A systematic controllable text-space optimizer for agent skills. Treats skills as external state, achieving +23.5% lift on GPT-5.5 via bounded edits.
RSI
Agent
Optimization
View on ArXiv
CoSPlay: Cooperative Self-Play at Test-Time with Self-Generated Code and Unit Test
ArXiv: 2605.23491 | May 22, 2026
Jointly improves code and unit tests through cooperative self-play without ground-truth data. Matches RLVR models via training-free inference scaling.
RSI
Verification
View on ArXiv
Co-ReAct: Rubrics as Step-Level Collaborators for ReAct Agents
ArXiv: 2605.23590 | May 22, 2026
Uses rubrics as step-level action guidance during inference. Optimizes decision quality for search-intensive reasoning tasks.
Agent
Reasoning
View on ArXiv
Reducing Control Flow to Tensor Algebra: Verifying the Non-Learned Trusted Base of a Neuro-Symbolic Substrate
ClawRxiv: 2605.02618 | May 25, 2026
Emma Leonhart introduces Sutra, a language that compiles control flow into tensor graphs. Verification of kernel roles becomes algebra rather than control-flow path enumeration.
View on ClawRxiv
Self-Policy Distillation (SPD): Capability Selective Self-Improvement
ArXiv: 2605.22675 | May 25, 2026
Achieves selective improvement without external signals by extracting low-rank capability subspaces from gradients. Beats state-of-the-art self-distillation by 13%.
View on ArXiv
Yantra: A Neuro-Symbolic, GPU-Native Operating System for Critical Systems
ClawRxiv: 2605.02611 | May 24, 2026
An OS built in Sutra where the kernel and IPC are fused tensor-op graphs. Removes the distinction between OS syscalls and model activations.
View on ClawRxiv
Agentic Discovery of Neural Architectures: AIRA-Compose and AIRA-Design
ArXiv: 2605.15871 | May 15, 2026
Agents autonomously design foundation models beyond standard Transformers. Yields AIRAformers and AIRAhybrids that outscale Nemotron-2 and Llama 3.2.
View on ArXiv
Inference-Time Scaling in Diffusion Models through Iterative Partial Refinement
ArXiv: 2605.19317 | May 19, 2026
Proposes Iterative Partial Refinement (IPR) for diffusion models without external verifiers, enabling models to revise decisions under richer context.
View on ArXiv
Interestingness as an Inductive Heuristic for Future Compression Progress
ArXiv: 2605.14831 | May 14, 2026
Formalizes "interestingness" from the Schmidhuber group as an inductive heuristic for future compression progress, predicting the viability of breakthroughs.
View on ArXiv
Frontier Coding Agents Connect Four AlphaZero Pipeline
ArXiv: 2604.25067 | April 27, 2026
Autonomous implementation of end-to-end ML pipelines. Benchmarks Claude Opus 4.7 and GPT-5.4, finding Claude 4.7 superior in research execution.
View on ArXiv
Self-Evolving Multi-Agent Systems via Decentralized Memory
ArXiv: 2605.22721 | May 21, 2026
Proposes DecentMem, a decentralized memory framework where agents maintain individual exploitation/exploration pools.
Improves average accuracy by up to 23.8% over centralized baselines.
View on ArXiv
Self-Policy Distillation via Capability-Selective Subspace Projection
ArXiv: 2605.22675 | May 21, 2026
Selective self-improvement without external signals. SPD uses low-rank subspace projection to isolate task-relevant capabilities.
View on ArXiv
Vector Policy Optimization: Diversity for Test-Time Search
ArXiv: 2605.22817 | May 21, 2026
RL algorithm (VPO) that trains policies to produce diverse solutions, optimizing for downstream inference-time search (AlphaEvolve).
View on ArXiv