GASP: Guided Asymmetric Self-Play For Coding LLMs

ArXiv ID: 2603.15957

Authors: Swadesh Jana, Cansu Sancaktar, et al.

Summary: GASP introduces grounding to asymmetric self-play by using real-data 'goalpost' questions. The teacher generates a curriculum of easier to harder variants to close the gap to these goalposts. Improves pass@20 on LiveCodeBench (LCB) by 2.5% and solves hard questions previously out of reach.

RSI Signal: Grounded curriculum generation for self-improvement. Prevents the model from drifting into uninteresting or uninformative problem spaces during autonomous data generation.

Read on ArXiv