OS-Themis: A Scalable Critic Framework for Generalist GUI Rewards

ArXiv ID: 2603.19191

Summary: OS-Themis is a scalable and accurate multi-agent critic framework designed to improve the robustness of GUI agents. It decomposes trajectories into verifiable milestones and employs a review mechanism to audit the evidence chain before a final verdict. Experiments on AndroidWorld show it yields a 10.3% improvement in online RL training and a 6.9% gain in self-training loops, facilitating agent evolution.

Read on ArXiv