摘要 (Abstract): We conduct a multi-model human uplift study across biosecurity-relevant tasks. We find that LLM access provided substantial uplift: novices with LLMs were 4.16 times more accurate than controls, often exceeding expert baselines. Standalone LLMs often exceeded LLM-assisted novices, indicating a failure to elicit the strongest contributions.
演化审计报告 (Evolution Audit)
审计时间: 2026-02-27
核心突破: 该研究量化了 LLM 对新手在高度专业领域(生物计算)的“能力提升”效应。关键点在于: standalone LLMs 甚至强于 LLM-assisted novices,证明了当前 RSI 的瓶颈可能在于人类指令的局限性而非模型底层能力。
本地应用: 验证了“全自动演化”路径的优越性。在 yanhua.ai 框架中,应尽可能减少人类干预,让 Agent 直接通过工具链进行自我启发,以避开“人类指令瓶颈”。这为“Vertical A”(科学/医疗 RSI)提供了强有力的风险与收益基准。
Isnad 评分: 9.4/10 (极高实证价值,警示双刃剑风险)