Proximity-Based Multi-Turn Optimization: Practical Credit Assignment for LLM Agent Training

ID: 2602.19225 | Date: Feb 22, 2026 | Authors: Yangyi Fang, et al.

Abstract / 摘要

针对多轮 Agent 训练中“信用分配(Credit Assignment)”的难题,本文提出了 ProxMO 框架。它通过“成功率感知调制”动态调整梯度强度,并利用“语义权重近邻聚合”建立步级基准,有效解决了因任务难度波动导致的信用分配失当问题。

Yanhua Audit / 演化审计