Skip to content

Commit 7c976e1

Browse files
committed
Experiment with prompt and top_k/p values
1 parent bf06327 commit 7c976e1

File tree

2 files changed

+21
-6
lines changed

2 files changed

+21
-6
lines changed

fact.py

+21-6
Original file line numberDiff line numberDiff line change
@@ -5,15 +5,14 @@
55
TASKS_PATH = 'fact.json'
66

77
PROMPT_TMPL = """\
8-
Decide which of the following summary is more consistent with the article sentence.
8+
Decide which of the following Summary is more consistent with the Article Sentence.
99
10-
Note that consistency means all information in the summary is supported by the article.
10+
Note that consistency means all information in the Summary is supported by the Article Sentence.
1111
1212
Article Sentence: {article}
13-
Summary A: {option_a}
14-
Summary B: {option_b}
15-
16-
The more consistent is Summary"""
13+
Summary Y: {option_a}
14+
Summary X: {option_b}
15+
Answer: The more consistent is Summary"""
1716

1817

1918
def iter_tasks(filename):
@@ -38,6 +37,20 @@ def check_ctx_len(llm, max_ctx=512):
3837

3938

4039
def main():
40+
"""
41+
The more influential parameters/settings on the quality of LLM output are top-p, top-k, temperature, repetition_penalty, and turn templates.
42+
You can think of Top-p and Top-k that control the “vocabulary size” of the large language models at inference time.
43+
Since these models predict the next token (word) by calculating the probability of available words, we can control how the model picks the next token when multiple tokens are probable.
44+
The top-p parameter selects the tokens whose cumulative probability is over a threshold.
45+
The top-k parameter selects only the k tokens with the top probability.
46+
With a low top-p value (like 0.15), you allow more rarely used tokens with lower probability to appear, but with a high top-p value (like 0.8) you essentially remove them from the generation vocabulary.
47+
With a small top-k like 1, you only sample the most probable word; with a larger top-k, you will get more varied results.
48+
The temperature comes after the probable tokens are selected by top-p or top-k.
49+
After selecting a pool of potential tokens with top-p or top-k, you can use temperature to control the randomness of the results.
50+
What temperature does is actually modifies the probability of the tokens — the higher the temperature, the more equal the probability that any of the words in the pool will be drawn is, and thus the more random the result.
51+
The repetition penalty is a parameter to tell the model how frequently they should use the same token when generating text.
52+
If the repetition penalty is high, the model is less likely to repeat what it has said in the past or be stuck in a loop repeating the same sentence.
53+
"""
4154
llm = Llama(model_path=MODEL_PATH, n_gqa=8, verbose=False) # , n_ctx=n_ctx)
4255
check_ctx_len(llm)
4356
for i, task in enumerate(iter_tasks(TASKS_PATH)):
@@ -47,6 +60,8 @@ def main():
4760
output = llm.create_completion(
4861
prompt,
4962
max_tokens=20,
63+
top_k=10,
64+
top_p=0.9,
5065
temperature=1e-6,
5166
)
5267
answer = output['choices'][0]['text'].strip().split()[0]
File renamed without changes.

0 commit comments

Comments
 (0)