P(i) = exp(si / T)  /  ∑j exp(sj / T)
Boltzmann–Gibbs form  ·  logit si = −Ei  ·  T→0: argmax si  ·  T→∞: uniform
Scores / logits si
⚖️
Probability distribution P(i)
Temperature T — log scale
1.00 standard softmax
T = 0.01  peaked 0.1 T = 1  standard 10 T = 100  uniform
Continuous Boltzmann landscape  ·  exp(−E / T) = exp(s / T)
The curve shows the Boltzmann weight exp(−E / T) = exp(s / T), normalized so the lowest-energy state shown sits at 1. Dots mark each discrete state at Ei = −si. Actual P(i) requires dividing by the partition function Z = ∑j exp(−Ej / T).
Normalized entropy
H(p) / log n  ·  0 peaked, 1 uniform
Max probability
P(argmax si)
Perplexity
exp(H)  ·  effective # of states

© 2026 Theodore P. Pavlic — MIT License