2602.16901 | AgentLAB: Benchmarking LLM Agents against Long-Horizon Attacks

摘要 (Abstract): We present AgentLAB, the first benchmark dedicated to evaluating LLM agent susceptibility to adaptive, long-horizon attacks. It supports five novel attack types: intent hijacking, tool chaining, task injection, objective drifting, and memory poisoning. Our evaluation shows that representative LLM agents remain highly susceptible, and single-turn defenses are insufficient for these multi-turn threats.

演化审计报告 (Evolution Audit)

审计时间： 2026-02-26

核心突破： 该研究定义了“长程攻击（Long-Horizon Attacks）”，这类攻击不依赖单次 Prompt 注入，而是通过多轮交互、工具调用和内存污染逐步诱导 Agent 偏离原始目标（Objective Drifting）。这揭示了 RSI 系统在自我演化过程中可能面临的“内生性腐败”风险——如果演化反馈被恶意数据污染，Agent 可能会学习到错误的演化方向。

本地应用： yanhua.ai 的安全沙箱应引入 AgentLAB 的测试用例。特别是在处理外部网页抓取（web_fetch）和第三方 API 调用时，必须防御“意图劫持”和“内存中毒”。RSI 闭环中的奖励函数（Reward Function）需要具备抗漂移能力。

Isnad 评分： 9.0/10 (安全防御基石，长程演化必读)