Replies: 2 comments 1 reply
-
使用LMEvaluator时设置温度参数为0.6,在使用MATHEvaluator作为判别器时是不是应该将温度参数设置为0或者0.001减小生成答案的随机性? |
Beta Was this translation helpful? Give feedback.
0 replies
-
Please consider aime2024_llmverify_repeat8_gen_e8fcee as reference. aime2024_gen_6e39a4 will truncate the max output length to 2048, which will be deprecated in the future. Also, you may need to repeat 64 for a stable performance. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
请教一下,我使用Ollama框架对AIME2024进行精度测试,使用的模型是Deepseek-R1-Distill-Qwen-7B的量化模型,如果使用aime2024_llmverify_repeat8_gen_e8fcee,Acc为62.08,但是我看官方给的paa@1只有55.5,然后我使用aime2024_gen_6e39a4并把max_out_len设为32768,Acc只有3.33
基于Ollama的测试代码:
eval_deepseek_r1_int4.txt
aime2024_gen_6e39a4代码:
aime2024_gen_6e39a4.txt
麻烦帮忙看一下使用Ollama的配置(主要是models)对不对呀?
Beta Was this translation helpful? Give feedback.
All reactions