P(i) = exp(s
i
/ T) / ∑
j
exp(s
j
/ T)
Boltzmann–Gibbs form — score s
i
≡ −ε
i
/k
B
— As T→0: mass concentrates on argmax s
i
. As T→∞: P → Uniform(1/n).
Scores / logits s
i
One winner
Two rivals
Gradient
All equal
Reset
Probability distribution P(i)
Temperature T — log scale
1.00
standard softmax
T = 0.01 peaked
0.1
T = 1 standard
10
T = 100 uniform
Normalized entropy
—
H(p) / log n — 0 peaked, 1 uniform
Max probability
—
P(argmax s
i
)
Perplexity
—
exp(H) — effective # of states
© 2026 Theodore P. Pavlic —
MIT License