After the model computes logits for every token in its vocabulary, sampling strategies decide which token is actually selected. The raw logits are a log-probability distribution. Three knobs shape it.
Temperature scales the logits before softmax. At temperature 0, the highest-logit token always wins (greedy/argmax). At temperature 1, the original distribution is preserved. Above 1, the distribution flattens, giving unlikely tokens a real chance. Watch the bars reshape as you drag the slider.
Top-K truncates to the K highest-probability tokens and renormalizes. Everything outside the top K is zeroed out. This prevents sampling from the extreme tail (hallucination territory) while preserving diversity among likely candidates.
Top-P (nucleus sampling) is adaptive. Instead of a fixed count, it takes the smallest set of tokens whose cumulative probability exceeds P. When the model is confident, this might be 3 tokens. When uncertain, it might be 20. The cumulative curve on the chart shows exactly where the threshold falls.
These filters stack: temperature first, then top-K, then top-P. The combination determines the tradeoff between coherence and creativity. Most API defaults use temperature 1.0 with top-P 0.95, leaving top-K disabled.
Holtzman et al. 2019