Skip to content

healthbench #2099

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

healthbench #2099

wants to merge 1 commit into from

Conversation

bio-mlhui
Copy link
Contributor

添加HealthBench评测 (官方evaluate方式)

Modification

包含3个文件:

  1. HealthBench/healthbench_model_gen_4175e2.py
    评测方式按照官方论文中(https://github.com/openai/simple-evals)方式 (vanilla + physician-mode)

  2. opencompass/datasets/healthbench/healthbench.py

  3. opencompass/openicl/icl_prompt_template.py
    添加了HealthBenchTemplate, 由于每个item中的prompt字段都是一个多轮对话,平常的PromptTemplate无法使用

Checklist

Before PR:

  • [✔ ] Pre-commit or other linting tools are used to fix the potential lint issues.
  • [ ✔] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests.
  • [✔ ] The modification is covered by complete unit tests. If not, please add more unit test to ensure the correctness.
  • The documentation has been modified accordingly, like docstring or example tutorials.
  1. Qwen2.5-1.5B, GPT-4o作为llm_judge的debug结果:
    image

After PR:

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove unused file plz

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants