Entropy trajectory shape predicts LLM reasoning reliability: A diagnostic study of uncertainty dynamics in chain-of-thought

ArXiv ID: 2603.18940

Summary: This study introduces entropy-trajectory monotonicity as a predictor for LLM reasoning correctness. A reasoning chain is "monotone" if its per-step answer-distribution entropy decreases at every step. Monotone chains show significantly higher accuracy (68.8% vs 46.8%) on benchmarks like GSM8K. The structural property of uncertainty trajectories is more informative than aggregate measures like total entropy reduction.

Read on ArXiv