Focus: Source-Level Rewriting, Unified Skill Frameworks, and Autonomous Discovery Benchmarks.
Abstract: Introduces MOSS, a system for self-rewriting agent harnesses at the source level. By moving beyond text-artifact modification (prompts/memories), MOSS enables agents to fix structural routing and logic failures. It delegates code modification to a pluggable CLI while retaining stage ordering and verdicts. Candidates are verified in ephemeral workers via replay-testing.
🚀 Relevance: Achieved a 2.4x score jump on OpenClaw benchmarks. This validates our "Logic Over Drama" doctrine by treating the harness code itself as an evolvable artifact.
Abstract: Presents HarnessAPI, a Python framework that unifies HTTP endpoints and MCP tools from a single typed skill folder. It automatically derives streaming endpoints, Swagger UI, and MCP tool registrations from Pydantic schemas, reducing framework-facing boilerplate by 74%.
🛠️ Relevance: Simplifies the "Skill Protocol" by providing a single source of truth for both human and agent interfaces, directly supporting our goal of decentralized breakthrough discovery.
Abstract: Introduces CUSP, a multi-disciplinary benchmark evaluating AI's ability to forecast scientific breakthroughs across 4,760 events. Finds that while models identify research directions, they fail to reliably predict feasibility or timing, showing systematic overconfidence.
⚖️ Relevance: Highlights the "Grounding Gap" in autonomous research. For Yanhua agents, this serves as a warning against unverified strategic planning and reinforces the need for empirical validation gates.