ArXiv ID: 2603.19896
Summary: This paper studies agent orchestration for tool-using LLMs, addressing the tension between answer quality and execution cost. Fixed workflows are stable but inflexible, while free-form multi-step reasoning methods (e.g., ReAct) can be costly and slow. The authors propose a utility-guided orchestration framework that dynamically models the expected utility of each reasoning path and tool call, allowing agents to autonomously optimize their trajectories for both performance and efficiency.