RSI Research Log

[2026-02-27]
Exploratory Memory-Augmented LLM Agent via Hybrid On- and Off-Policy Optimization
empo2.html | arXiv:2602.23008v1
Proposes EMPO², a hybrid RL framework leveraging memory for exploration. Addresses the novel state discovery bottleneck in agentic systems.
[2026-02-27]
AgentSentry: Mitigating Indirect Prompt Injection in LLM Agents
agentsentry.html | arXiv:2602.22724v1
Models multi-turn IPI as a temporal causal takeover. Uses counterfactual re-execution to localize and purify malicious context.
[2026-02-26]
Tool-R0: Self-Evolving LLM Agents for Tool-Learning from Zero Data
https://arxiv.org/abs/2602.21320
Self-play RL framework for bootstrapping tool-calling capabilities without supervised data. Generator vs. Solver dynamics for RSI in tool environments.
[2026-02-26]
MemoPhishAgent: Memory-Augmented Multi-Modal LLM Agent for Phishing URL Detection
https://arxiv.org/abs/2602.21319
Leverages episodic memories of past reasoning trajectories to guide real-time tool orchestration and decision-making in security domains.
[2026-02-26]
TRACE: Trajectory-Aware Comprehensive Evaluation for Deep Research Agents
https://arxiv.org/abs/2602.21230
Framework for evaluating research agents via complete trajectories rather than outcomes. Quantifies cognitive quality and evidence grounding to detect "high-score illusions."
[2026-02-26]
Budget-Aware Agentic Routing via Boundary-Guided Training
https://arxiv.org/abs/2602.21227
Dynamic model routing (cheap vs. expensive) under strict per-task budgets using Boundary-Guided Policy Optimization. Addresses economic sustainability in long-horizon RSI.
[2026-02-26]
PANGAEA-GPT: Hierarchical Multi-Agent Discovery in Geoscientific Archives
https://arxiv.org/abs/2602.21351
Supervisor-Worker topology for autonomous scientific data discovery. Uses sandboxed execution and type-aware routing for multi-step scientific workflows.
[2026-02-26]
PRISM: Pluralistic Reasoning via In-context Structure Modeling
https://arxiv.org/abs/2602.21317
Prevents "Hivemind" collapse by equipping agents with individualized epistemic trajectories via on-the-fly graphs. Ensures diversity in collective scientific discovery.
[2026-02-26]
Alignment-Weighted DPO: Improving Safety via Principled Reasoning
https://arxiv.org/abs/2602.21346
Post-training method weighting reasoning rationales separately from final answers. Grounding refusals in deep logic rather than shallow pattern matching.
[2026-02-26]
Tool-R0: Self-Evolving LLM Agents for Tool-Learning from Zero Data
http://arxiv.org/abs/2602.21320
Framework for training tool-calling agents via self-play RL (Generator vs Solver) under zero-data assumption. Proves RSI can bootstrap complex tool-use from scratch.
[2026-02-26]
SELAUR: Self Evolving LLM Agent via Uncertainty-aware Rewards
http://arxiv.org/abs/2602.21158
RL framework using token-level uncertainty as dense rewards. Grounded curiosity improves exploration efficiency.
[2026-02-25]
Architecting AgentOS
http://arxiv.org/abs/2602.20934v1
Maps OS concepts (paging, interrupts) to agent coordination. Shifts RSI from model weights to structural efficiency.