Skip to content

Improve pass@1 Score on Humaneval #311

@showlibia

Description

@showlibia

Description:

When evaluating the Qwen2.5-Coder-14B-Instruct model on the Humaneval benchmark, I observed that the pass@1 score was relatively low. By analyzing the generations.json file, I found that many generated code snippets contained errors, often ending with tokens <|endoftext|>. These incomplete code fragments result in syntax errors during evaluation, causing otherwise correct solutions (without such partial suffixes) to fail.

To address this issue, I added <| to the list of stop_words during generation to prevent the model from appending incomplete or malformed code. Experimental results show that this simple modification significantly improves the pass@1 score.

Suggestion:

Including special tokens like <| in the default stop token list could help improve the completeness and correctness of generated code.

If this issue makes sense, happy to create a PR for that.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions