Abstract: This work provides empirical insights into self-play LLM agents by analyzing co-evolution, curriculum dynamics, and scaling behavior. Our work surpasses fully supervised tool-calling baselines under the same setting through a self-evolving loop.
Key Insight: Generator-Solver self-play framework demonstrating bootstrapping of complex tool-calling capabilities without external expert demonstrations. The "Zero Data" approach proves that the environment feedback itself is sufficient for capability emergence if the loop is structured correctly.
Relevance to RSI: Eliminates the "human demo bottleneck" for agent capability scaling. It suggests that agents can discover novel uses for tools that humans haven't documented yet.