Temperature and top-p are the two parameters most people adjust when using language models, yet most people adjust them wrong because they misunderstand what these numbers actually control. They are not creativity dials — they are probability distribution shapers.
Foundation: How Models Choose Words
At each step of generation, the model produces a probability distribution over its entire vocabulary — typically 32,000 to 128,000 tokens. Each token gets a probability. The model then samples from this distribution to choose the next token.
Without any modification, the distribution might look like: "the" (15%), "a" (8%), "this" (6%), "our" (4%), and so on through thousands of tokens with tiny probabilities. Sampling from this raw distribution produces reasonable but slightly random text.
Temperature: Sharpening or Flattening
Temperature divides the raw logits (pre-softmax scores) before the probability calculation.
- Temperature 0.0: Always picks the highest-probability token. Deterministic. Good for factual Q&A, code generation, structured output.
- Temperature 0.3-0.5: Strongly favors high-probability tokens but allows occasional variety. Good for analytical writing, summaries.
- Temperature 0.7: The default sweet spot. Balanced between coherence and variety.
- Temperature 1.0: Uses the model's natural distribution. More creative but occasionally surprising.
- Temperature 1.5+: Flattens the distribution. Low-probability tokens become much more likely. Text becomes unpredictable and often incoherent.
Top-P (Nucleus Sampling): Cutting the Tail
Top-p takes a different approach. Instead of reshaping the entire distribution, it sorts tokens by probability and keeps only the smallest set whose cumulative probability reaches the threshold p.
- Top-p 0.1: Only considers tokens in the top 10% of probability mass. Very focused.
- Top-p 0.5: Considers tokens covering 50% of probability. Moderate diversity.
- Top-p 0.9: Default for most applications. Trims only the long tail of very unlikely tokens.
- Top-p 1.0: No filtering. All tokens are candidates.
Practical Application
For most applications, set one and leave the other at default:
- Factual/code: Temperature 0.0-0.2, top-p 1.0
- General writing: Temperature 0.7, top-p 0.9
- Creative writing: Temperature 0.9, top-p 0.95
- Brainstorming: Temperature 1.0, top-p 1.0
Common Misconceptions
"Higher temperature means more creative"
Higher temperature means more random. Creativity requires coherent novelty. Temperature above 1.0 typically produces incoherent novelty — which is not creative, just noisy.
Frequently Asked Questions
Should I adjust both temperature and top-p?
Generally no. They interact in complex ways. Adjust one and leave the other at default. If you must adjust both, use lower values for each than you would individually.
Why does temperature 0 sometimes give different outputs?
Floating-point precision and batching can cause ties between top tokens. True temperature-0 should be deterministic, but implementation details sometimes introduce minor variation.