P(i) = exp(s
i
/ T) / ∑
j
exp(s
j
/ T)
Boltzmann–Gibbs form · logit s
i
= −E
i
· T→0: argmax s
i
· T→∞: uniform
Logit view
Energy view
Scores / logits s
i
⚖️
One winner
Two rivals
Gradient
All equal
Reset
Probability distribution P(i)
Temperature T — log scale
1.00
standard softmax
T = 0.01 peaked
0.1
T = 1 standard
10
T = 100 uniform
Continuous Boltzmann landscape · exp(−E / T) = exp(s / T)
The curve shows the Boltzmann weight exp(−E / T) = exp(s / T), normalized so the lowest-energy state shown sits at 1. Dots mark each discrete state at E
i
= −s
i
. Actual P(i) requires dividing by the partition function Z = ∑
j
exp(−E
j
/ T).
Normalized entropy
—
H(p) / log n · 0 peaked, 1 uniform
Max probability
—
P(argmax s
i
)
Perplexity
—
exp(H) · effective # of states
© 2026 Theodore P. Pavlic —
MIT License