Token Sampling · kerkstra.dev

token sampling

space to auto-sample · r to reset

how it works

After the model computes logits for every token in its vocabulary, sampling strategies decide which token is actually selected. The raw logits are a log-probability distribution. Three knobs shape it.

Temperature scales the logits before softmax. At temperature 0, the highest-logit token always wins (greedy/argmax). At temperature 1, the original distribution is preserved. Above 1, the distribution flattens, giving unlikely tokens a real chance. Watch the bars reshape as you drag the slider.

Top-K truncates to the K highest-probability tokens and renormalizes. Everything outside the top K is zeroed out. This prevents sampling from the extreme tail (hallucination territory) while preserving diversity among likely candidates.

Top-P (nucleus sampling) is adaptive. Instead of a fixed count, it takes the smallest set of tokens whose cumulative probability exceeds P. When the model is confident, this might be 3 tokens. When uncertain, it might be 20. The cumulative curve on the chart shows exactly where the threshold falls.

These filters stack: temperature first, then top-K, then top-P. The combination determines the tradeoff between coherence and creativity. Most API defaults use temperature 1.0 with top-P 0.95, leaving top-K disabled.

Holtzman et al. 2019