评测参数bug #3770

1212wuhu · 2025-04-05T04:37:55Z

Describe the bug
评测时，模型输出参数被强制性调整为2048。控制台输出部分如下，应该是使用了evalscope后端但未执行参数覆盖
控制台输出（部分）

2025-04-05 12:30:05,346 - evalscope - INFO - Args: Task config is provided with TaskConfig type.
2025-04-05 12:30:05,351 - evalscope - INFO - Check the OpenCompass environment: OK
2025-04-05 12:30:05,362 - evalscope - INFO - Dump task config to /home/dataset-assist-0/zgy/swift/eval_output/opencompass/20250405_123005/configs/task_config_0da48a.yaml
2025-04-05 12:30:05,372 - evalscope - INFO - {
    "model": null,
    "model_id": null,
    "model_args": {
        "revision": "master",
        "precision": "torch.float16"
    },
    "template_type": null,
    "chat_template": null,
    "datasets": [],
    "dataset_args": {},
    "dataset_dir": "/root/.cache/modelscope/hub/datasets",
    "dataset_hub": "modelscope",
    "generation_config": {
        "max_length": 2048,
        "max_new_tokens": 512,
        "do_sample": false,
        "top_k": 50,
        "top_p": 1.0,
        "temperature": 1.0
    },
    "eval_type": "checkpoint",
    "eval_backend": "OpenCompass",
    "eval_config": {
        "datasets": [
            "math"
        ],
        "batch_size": 16,
        "work_dir": "/home/dataset-assist-0/zgy/swift/eval_output/opencompass",
        "models": [
            {
                "path": "checkpoint-44301-merged",
                "openai_api_base": "http://127.0.0.1:8000/v1/chat/completions",
                "key": "EMPTY",
                "is_chat": true
            }
        ],
        "limit": 100,
        "time_str": "20250405_123005"
    },
    "stage": "all",
    "limit": null,
    "eval_batch_size": 1,
    "mem_cache": false,
    "use_cache": null,
    "work_dir": "/home/dataset-assist-0/zgy/swift/eval_output/opencompass/20250405_123005",
    "outputs": null,
    "debug": false,
    "dry_run": false,
    "seed": 42,
    "api_url": null,
    "api_key": "EMPTY",
    "timeout": null,
    "stream": false,
    "judge_strategy": "auto",
    "judge_worker_num": 8,
    "judge_model_args": {}
}
2025-04-05 12:30:06,039 - evalscope - INFO - *** Run task with config: /tmp/tmpxd7_zkjj.py 

04/05 12:30:06 - OpenCompass - INFO - Current exp folder: /home/dataset-assist-0/zgy/swift/eval_output/opencompass/20250405_123005
04/05 12:30:07 - OpenCompass - WARNING - SlurmRunner is not used, so the partition argument is ignored.
04/05 12:30:07 - OpenCompass - INFO - Partitioned into 1 tasks.

运行脚本：

#!/bin/bash
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
swift eval \
    --model /home/dataset-assist-0/zgy/swift/output_sft/v15-20250402-132218/checkpoint-44301-merged \
    --eval_backend OpenCompass \
    --infer_backend vllm \
    --eval_limit 100 \
    --eval_dataset math \
    --max_model_len 27000 \
    --stream true \
    --tensor_parallel_size 4

可以看到，即使指定了max_model_len，也会被强制设定为 "max_length": 2048,"max_new_tokens": 512,
从评测输出结果看也是如此

The text was updated successfully, but these errors were encountered:

wnark · 2025-04-07T08:48:08Z

加上不能选择OpenCompass，VLMEvalKit 这些后端，选择就报错
更新:
需要根据提示安装所需的库，基础的顺序是:

pip install ms-swift -U
pip install evalscope
pip install 'evalscope[opencompass]'
pip install vllm==0.8.0 # ms-swift 需要旧版本的transformers

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

评测参数bug #3770

评测参数bug #3770

1212wuhu commented Apr 5, 2025

wnark commented Apr 7, 2025 •

edited

Loading

评测参数bug #3770

评测参数bug #3770

Comments

1212wuhu commented Apr 5, 2025

wnark commented Apr 7, 2025 • edited Loading

wnark commented Apr 7, 2025 •

edited

Loading