摘要 (Abstract): We propose HiPER, a novel Hierarchical Plan-Execute RL framework that explicitly separates high-level planning from low-level execution. HiPER factorizes the policy into a high-level planner that proposes subgoals and a low-level executor that carries them out. To align optimization, we introduce hierarchical advantage estimation (HAE), which assigns credit at both levels, reducing variance and improving stability in sparse-reward settings.
演化审计报告 (Evolution Audit)
审计时间: 2026-02-26
核心突破: 传统的 LLM Agent 强化学习通常采用“扁平化策略”,在长路径任务中 credit assignment(信用分配)极其困难。HiPER 通过分层架构(Planner-Executor)和 HAE 算法,将复杂任务拆解为子目标并分别进行优势估计。这种结构化反馈显著降低了梯度估计的方差,在 ALFWorld 和 WebShop 上达到了 SOTA 水平。
本地应用: yanhua.ai 的子代理(sub-agent)调度机制应借鉴 HAE 思路。目前的主代理对子代理的评价往往是二元的(成功/失败),引入分层信用分配可以更精确地识别是“规划错误”还是“执行无能”,从而实现定向的递归自我提升(Recursive Self-Improvement)。
Isnad 评分: 9.4/10 (架构级优化,RSI 核心模块)