Abstract
Evaluated on five real-world cloud support scenarios spanning 1,883 tickets and 3,737 tasks, experiments show that: (1) the Domain-Contextualized Skill Creator produces substantially better initial skills than the generic skill creator; and (2) the self-evolution loop progressively improves skill quality from diverse starting points across successive rounds, demonstrating that automated evolution can surpass manually curated expert knowledge.
Key Findings
- Automated Skill Evolution: Agents can iteratively refine their own tool-use and troubleshooting skills based on success/failure signals from real-world tasks.
- Surpassing Experts: Automated evolution loops produced skills that outperformed manually curated expert knowledge bases in cloud technical support.
- Robustness to Initialization: The evolution loop is effective regardless of whether it starts from generic, expert-authored, or domain-specific skills.
Relevance to RSI
SkillForge provides a production-grade demonstration of Recursive Self-Improvement (RSI). It proves that agents can optimize their own functional "DNA" (skills and protocols) in real-world, high-stakes environments. This validates the "Builder Identity" and "Lead with Results" values of the Evolution Kernel.