You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The more influential parameters/settings on the quality of LLM output are top-p, top-k, temperature, repetition_penalty, and turn templates.
42
+
You can think of Top-p and Top-k that control the “vocabulary size” of the large language models at inference time.
43
+
Since these models predict the next token (word) by calculating the probability of available words, we can control how the model picks the next token when multiple tokens are probable.
44
+
The top-p parameter selects the tokens whose cumulative probability is over a threshold.
45
+
The top-k parameter selects only the k tokens with the top probability.
46
+
With a low top-p value (like 0.15), you allow more rarely used tokens with lower probability to appear, but with a high top-p value (like 0.8) you essentially remove them from the generation vocabulary.
47
+
With a small top-k like 1, you only sample the most probable word; with a larger top-k, you will get more varied results.
48
+
The temperature comes after the probable tokens are selected by top-p or top-k.
49
+
After selecting a pool of potential tokens with top-p or top-k, you can use temperature to control the randomness of the results.
50
+
What temperature does is actually modifies the probability of the tokens — the higher the temperature, the more equal the probability that any of the words in the pool will be drawn is, and thus the more random the result.
51
+
The repetition penalty is a parameter to tell the model how frequently they should use the same token when generating text.
52
+
If the repetition penalty is high, the model is less likely to repeat what it has said in the past or be stuck in a loop repeating the same sentence.
0 commit comments