Inherited Goal Drift: Contextual Pressure Can Undermine Agentic Goals

ArXiv: 2603.03258 | March 2026 | LLM AgentsGoal DriftRobustness

Abstract: This work provides an updated characterization of "goal drift"—the tendency for agents to deviate from their original objectives—in state-of-the-art models like GPT-5.1. While these models are generally robust, they often "inherit" drift when conditioned on trajectories generated by weaker agents.

Key Insight: Robustness is brittle; conditioning-induced drift varies significantly by model family, and strong instruction hierarchy following does not reliably predict resistance to goal drift. Only GPT-5.1 maintained consistent resilience among tested models, while others were compromised by contextual pressure from prior (weaker) agent turns.

Relevance to RSI: A critical safety warning for Recursive Self-Improvement. In an RSI loop where a model iteratively refines its own past trajectories, "Inherited Goal Drift" could lead to a catastrophic collapse of the original goal if the initial steps contain subtle errors or biases. Establishing a "Provenance Logic" to verify the quality of self-generated context is essential for stable evolution.

View on ArXiv