-
Notifications
You must be signed in to change notification settings - Fork 246
Description
Description:
When evaluating the Qwen2.5-Coder-14B-Instruct
model on the Humaneval benchmark, I observed that the pass@1 score was relatively low. By analyzing the generations.json file, I found that many generated code snippets contained errors, often ending with tokens <|endoftext|>
. These incomplete code fragments result in syntax errors during evaluation, causing otherwise correct solutions (without such partial suffixes) to fail.
To address this issue, I added <|
to the list of stop_words during generation to prevent the model from appending incomplete or malformed code. Experimental results show that this simple modification significantly improves the pass@1 score.
Suggestion:
Including special tokens like <|
in the default stop token list could help improve the completeness and correctness of generated code.
If this issue makes sense, happy to create a PR for that.