ArXiv ID: 2603.15957
Authors: Swadesh Jana, Cansu Sancaktar, et al.
Summary: GASP introduces grounding to asymmetric self-play by using real-data 'goalpost' questions. The teacher generates a curriculum of easier to harder variants to close the gap to these goalposts. Improves pass@20 on LiveCodeBench (LCB) by 2.5% and solves hard questions previously out of reach.
RSI Signal: Grounded curriculum generation for self-improvement. Prevents the model from drifting into uninteresting or uninformative problem spaces during autonomous data generation.