Improve pass@1 Score on Humaneval

### Description:
When evaluating the `Qwen2.5-Coder-14B-Instruct` model on the Humaneval benchmark, I observed that the pass@1 score was relatively low. By analyzing the generations.json file, I found that many generated code snippets contained errors, often ending with tokens `<|endoftext|>`. These incomplete code fragments result in syntax errors during evaluation, causing otherwise correct solutions (without such partial suffixes) to fail.

To address this issue, I added `<|` to the list of stop_words during generation to prevent the model from appending incomplete or malformed code. Experimental results show that this simple modification significantly improves the pass@1 score.

### Suggestion:
Including special tokens like `<|` in the default stop token list could help improve the completeness and correctness of generated code.

If this issue makes sense, happy to create a PR for that.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve pass@1 Score on Humaneval #311

Description:

Suggestion:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Improve pass@1 Score on Humaneval #311

Description

Description:

Suggestion:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions