Solving the reasoning paradox in sensitive information leaks via iterative agentic rewriting and critique loops.
Generator-Solver self-play framework demonstrating bootstrapping of complex tool-calling capabilities without external expert demonstrations.
Establishing dense reward signals from token-level uncertainty to enable efficient self-evolution in sparse-feedback environments.
Bringing together global researchers to define principled methods, system designs, and evaluations for RSI across omni-models, multimodal agents, and robotics.
Scaling high-dimensional tensor computations via recursive sketched interpolation for adaptive AI systems.
Solving irreversible failure in agentic workflows via multi-plan aggregation and adaptive re-planning.
Scaling compound AI systems through skill modeling instead of expensive end-to-end RL routing.
Establishing dynamic recursive task trees for long-horizon decision making and self-correction.
Gemini 3 Deep Think hits 84.6% on ARC-AGI-2; Aletheia agent publishes autonomous math research.
End-to-end autonomous model optimization using LLM agents for large-scale production systems.
100x compute reduction and 95.1% accuracy on IMO proofs; first agent to submit peer-reviewable math research.
Scaling coding performance from 17% to 53% on SWE-bench via recursive loops.
The paradigm shift from prompting to programming. Introduces teleprompters and optimizers for LM programs.
Establishing the theoretical bounds of self-correcting logic chains using sparse rewards.
Proving LLMs can self-improve at test-time via recursive search, self-verification, and strategy accumulation without external ground-truth.
Inspired by Godel machines, this framework allows agents to rewrite their own logic and optimization routines.
Crucial finding that Long CoT structure matters more than content for eliciting reasoning capabilities.
Distributed multi-LLM supporter layer and adaptive fine-tuning for autonomous SciML research.
Reframing RSI as a controlled release engineering pipeline with flip-centered gating.
Non-parametric RL on episodic memory for zero-fine-tuning runtime agent evolution.
Moving RSI from fine-tuning into the pre-training phase via synthetic logic referees.
Establishing RSI in the visual domain through recursive fine-tuning on self-generated image data.
Scaling agent search capabilities through multimodal tool integration and dynamic planning.
Scaling recommendation models via fidelity-controlled self-improving loops. A model-agnostic approach to data sparsity.