Description
镜像:使用intelanalytics/ipex-llm-serving-xpu:0.8.3-b19 或者intelanalytics/ipex-llm-serving-xpu:0.8.3-b21镜像
模型: DeepSeek-R1-Distill-Qwen-32B SYM_INT4 模型
工具: Lighteval
数据集 :MMLU
Benchmark后精度结果值偏低才27.67%。 DeepSeek-R1-Distill-Qwen-32B INT4 模型 在NV A100上Benchmark的精度值为78.82%
(WrapperWithLoadBit pid=10769) 2025:06:13-12:30:17:(10769) |CCL_WARN| device_family is unknown, topology discovery could be incorrect, it might result in suboptimal performance [repeated 2x across cluster]
(WrapperWithLoadBit pid=10769) 2025:06:13-12:30:17:(10769) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices [repeated 24x across cluster]
(WrapperWithLoadBit pid=10769) -----> current rank: 3, world size: 4, byte_count: 15360000,is_p2p:1 [repeated 2x across cluster]
(WrapperWithLoadBit pid=10769) WARNING 06-13 12:30:19 [_logger.py:68] Pin memory is not supported on XPU. [repeated 2x across cluster]
[2025-06-13 15:38:57,787] [ INFO]: --- COMPUTING METRICS --- (pipeline.py:498)
[2025-06-13 15:38:58,608] [ INFO]: --- DISPLAYING RESULTS --- (pipeline.py:540)
Task | Version | Metric | Value | Stderr | |
---|---|---|---|---|---|
all | acc | 0.2767 | ± | 0.0332 | |
original:mmlu:_average:0 | acc | 0.2767 | ± | 0.0332 | |
original:mmlu:abstract_algebra:0 | 0 | acc | 0.2200 | ± | 0.0416 |
original:mmlu:anatomy:0 | 0 | acc | 0.2370 | ± | 0.0367 |
original:mmlu:astronomy:0 | 0 | acc | 0.2500 | ± | 0.0352 |
original:mmlu:business_ethics:0 | 0 | acc | 0.3800 | ± | 0.0488 |
original:mmlu:clinical_knowledge:0 | 0 | acc | 0.2340 | ± | 0.0261 |
original:mmlu:college_biology:0 | 0 | acc | 0.3125 | ± | 0.0388 |
original:mmlu:college_chemistry:0 | 0 | acc | 0.2000 | ± | 0.0402 |
original:mmlu:college_computer_science:0 | 0 | acc | 0.2700 | ± | 0.0446 |
original:mmlu:college_mathematics:0 | 0 | acc | 0.2100 | ± | 0.0409 |
original:mmlu:college_medicine:0 | 0 | acc | 0.2254 | ± | 0.0319 |
original:mmlu:college_physics:0 | 0 | acc | 0.2157 | ± | 0.0409 |
original:mmlu:computer_security:0 | 0 | acc | 0.3300 | ± | 0.0473 |
original:mmlu:conceptual_physics:0 | 0 | acc | 0.3064 | ± | 0.0301 |
original:mmlu:econometrics:0 | 0 | acc | 0.2368 | ± | 0.0400 |
original:mmlu:electrical_engineering:0 | 0 | acc | 0.2759 | ± | 0.0372 |
original:mmlu:elementary_mathematics:0 | 0 | acc | 0.2249 | ± | 0.0215 |
original:mmlu:formal_logic:0 | 0 | acc | 0.2778 | ± | 0.0401 |
original:mmlu:global_facts:0 | 0 | acc | 0.2100 | ± | 0.0409 |
original:mmlu:high_school_biology:0 | 0 | acc | 0.2226 | ± | 0.0237 |
original:mmlu:high_school_chemistry:0 | 0 | acc | 0.1823 | ± | 0.0272 |
original:mmlu:high_school_computer_science:0 | 0 | acc | 0.2900 | ± | 0.0456 |
original:mmlu:high_school_european_history:0 | 0 | acc | 0.3212 | ± | 0.0365 |
original:mmlu:high_school_geography:0 | 0 | acc | 0.3030 | ± | 0.0327 |
original:mmlu:high_school_government_and_politics:0 | 0 | acc | 0.2176 | ± | 0.0298 |
original:mmlu:high_school_macroeconomics:0 | 0 | acc | 0.2538 | ± | 0.0221 |
original:mmlu:high_school_mathematics:0 | 0 | acc | 0.2111 | ± | 0.0249 |
original:mmlu:high_school_microeconomics:0 | 0 | acc | 0.2563 | ± | 0.0284 |
original:mmlu:high_school_physics:0 | 0 | acc | 0.1987 | ± | 0.0326 |
original:mmlu:high_school_psychology:0 | 0 | acc | 0.3523 | ± | 0.0205 |
original:mmlu:high_school_statistics:0 | 0 | acc | 0.1620 | ± | 0.0251 |
original:mmlu:high_school_us_history:0 | 0 | acc | 0.2990 | ± | 0.0321 |
original:mmlu:high_school_world_history:0 | 0 | acc | 0.3882 | ± | 0.0317 |
original:mmlu:human_aging:0 | 0 | acc | 0.3453 | ± | 0.0319 |
original:mmlu:human_sexuality:0 | 0 | acc | 0.3359 | ± | 0.0414 |
original:mmlu:international_law:0 | 0 | acc | 0.2893 | ± | 0.0414 |
original:mmlu:jurisprudence:0 | 0 | acc | 0.2963 | ± | 0.0441 |
original:mmlu:logical_fallacies:0 | 0 | acc | 0.3313 | ± | 0.0370 |
original:mmlu:machine_learning:0 | 0 | acc | 0.3214 | ± | 0.0443 |
original:mmlu:management:0 | 0 | acc | 0.2718 | ± | 0.0441 |
original:mmlu:marketing:0 | 0 | acc | 0.4316 | ± | 0.0324 |
original:mmlu:medical_genetics:0 | 0 | acc | 0.3000 | ± | 0.0461 |
original:mmlu:miscellaneous:0 | 0 | acc | 0.3614 | ± | 0.0172 |
original:mmlu:moral_disputes:0 | 0 | acc | 0.2919 | ± | 0.0245 |
original:mmlu:moral_scenarios:0 | 0 | acc | 0.2402 | ± | 0.0143 |
original:mmlu:nutrition:0 | 0 | acc | 0.2516 | ± | 0.0248 |
original:mmlu:philosophy:0 | 0 | acc | 0.2379 | ± | 0.0242 |
original:mmlu:prehistory:0 | 0 | acc | 0.2809 | ± | 0.0250 |
original:mmlu:professional_accounting:0 | 0 | acc | 0.2411 | ± | 0.0255 |
original:mmlu:professional_law:0 | 0 | acc | 0.2477 | ± | 0.0110 |
original:mmlu:professional_medicine:0 | 0 | acc | 0.1875 | ± | 0.0237 |
original:mmlu:professional_psychology:0 | 0 | acc | 0.3105 | ± | 0.0187 |
original:mmlu:public_relations:0 | 0 | acc | 0.2818 | ± | 0.0431 |
original:mmlu:security_studies:0 | 0 | acc | 0.2939 | ± | 0.0292 |
original:mmlu:sociology:0 | 0 | acc | 0.2985 | ± | 0.0324 |
original:mmlu:us_foreign_policy:0 | 0 | acc | 0.3200 | ± | 0.0469 |
original:mmlu:virology:0 | 0 | acc | 0.2892 | ± | 0.0353 |
original:mmlu:world_religions:0 | 0 | acc | 0.4386 | ± | 0.0381 |
[2025-06-13 15:38:58,686] [ INFO]: --- SAVING AND PUSHING RESULTS --- (pipeline.py:530)
[2025-06-13 15:38:58,686] [ INFO]: Saving experiment tracker (evaluation_tracker.py:196)
[2025-06-13 15:39:07,447] [ INFO]: Saving results to /llm/intelmc8/shawn/project/lighteval/results/results/_llm_intelmc8_models_DeepSeek-R1-Distill-Qwen-32B/results_2025-06-13T15-38-58.686645.json (evaluation_tracker.py:265)