Skip to content

Is vllm==0.8.3 causing some incompatible problems #602

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
roaminwind opened this issue Apr 15, 2025 · 12 comments
Open

Is vllm==0.8.3 causing some incompatible problems #602

roaminwind opened this issue Apr 15, 2025 · 12 comments

Comments

@roaminwind
Copy link

First I got vllm=0.8.3 and lighteval=0.8.1dev
but problem AttributeError

then I follow some suggestions like checkout the git repository to a certain version, and the problem really disappeared
However, a new problem says VLLMModelConfig.init() got an unexpected keyword argument 'max_num_batched_tokens'

Then I remember when I clone the project at the beginning, vllm version was 0.7.2

I pip install a 0.7.2 version, but more problems arise

:<

@qianfantianyuzhouzhou
Copy link

vllm==0.7.1,it works

@roaminwind
Copy link
Author

roaminwind commented Apr 15, 2025 via email

@StarLooo
Copy link

same question about VLLMModelConfig.init() got an unexpected keyword argument 'max_num_batched_tokens'

@lewtun
Copy link
Member

lewtun commented Apr 16, 2025

Hi everyone, yes you need to use the pinned version of lighteval to work with vllm=0.8.3 because this PR has breaking changes.

There's a separate issue with DP>1 that is being tracked here: huggingface/lighteval#670

@StarLooo
Copy link

Hi everyone, yes you need to use the pinned version of lighteval to work with vllm=0.8.3 because this PR has breaking changes.

There's a separate issue with DP>1 that is being tracked here: huggingface/lighteval#670

Thanks!
So, which version of lighteval should we use to solve this problem?

@StarLooo
Copy link

StarLooo commented Apr 17, 2025

Hi everyone, yes you need to use the pinned version of lighteval to work with vllm=0.8.3 because this PR has breaking changes.
There's a separate issue with DP>1 that is being tracked here: huggingface/lighteval#670

Thanks! So, which version of lighteval should we use to solve this problem?

Well, I try to clone the latest version of lighteval repository and install from source.
After I modify the MODEL_ARGS from:
MODEL_ARGS="pretrained=$MODEL,dtype=bfloat16,max_model_length=32768,max_num_batched_tokens=32768,gpu_memory_utilization=0.8,generation_parameters={max_new_tokens:32768,temperature:0.6,top_p:0.95}"
to
MODEL_ARGS="model_name=$MODEL,dtype=bfloat16,data_parallel_size=$NUM_GPUS,max_model_length=32768,max_num_batched_tokens=32768,gpu_memory_utilization=0.8,generation_parameters={max_new_tokens:16384,temperature:0.6,top_p:0.95}"
I roughly reproduce the results of deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B and deepseek-ai/DeepSeek-R1-Distill-Qwen-7B using lighteval.

extractive_match MATH-500 AIME24 math_pass@1:32_samples AIME24
DeepSeek-R1-Distill-Qwen-1.5B 0.848 0.300 DeepSeek-R1-Distill-Qwen-1.5B 0.300
DeepSeek-R1-Distill-Qwen-7B 0.934 0.633 DeepSeek-R1-Distill-Qwen-7B 0.496

The DeepSeek-R1-Distill-Qwen-7B's performance on AIME24 seems to be too large? If any one can get the similar or diffirent results, we can share with each other and have a further discussion.

@JoeyXuquant11
Copy link

@lewtun plz tell me which version of lighteval is compatiable, I am also struggling with that

@StarLooo
Copy link

@lewtun plz tell me which version of lighteval is compatiable, I am also struggling with that

Hi, you can follow what I mentioned before:

  1. git clone the latest version of lighteval repository and install from source.
  2. modify the MODEL_ARGS to:
    2.1 use 'model_name' instead of 'pretrained'
    2.2 reduce the max_new_tokens since the original 32k is too large

@StarLooo
Copy link

StarLooo commented Apr 21, 2025

Hi everyone, yes you need to use the pinned version of lighteval to work with vllm=0.8.3 because this PR has breaking changes.
There's a separate issue with DP>1 that is being tracked here: huggingface/lighteval#670

Thanks! So, which version of lighteval should we use to solve this problem?

Well, I try to clone the latest version of lighteval repository and install from source. After I modify the MODEL_ARGS from: MODEL_ARGS="pretrained=$MODEL,dtype=bfloat16,max_model_length=32768,max_num_batched_tokens=32768,gpu_memory_utilization=0.8,generation_parameters={max_new_tokens:32768,temperature:0.6,top_p:0.95}" to MODEL_ARGS="model_name=$MODEL,dtype=bfloat16,data_parallel_size=$NUM_GPUS,max_model_length=32768,max_num_batched_tokens=32768,gpu_memory_utilization=0.8,generation_parameters={max_new_tokens:16384,temperature:0.6,top_p:0.95}" I roughly reproduce the results of deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B and deepseek-ai/DeepSeek-R1-Distill-Qwen-7B using lighteval.

extractive_match MATH-500 AIME24 math_pass@1:32_samples AIME24
DeepSeek-R1-Distill-Qwen-1.5B 0.848 0.300 DeepSeek-R1-Distill-Qwen-1.5B 0.300
DeepSeek-R1-Distill-Qwen-7B 0.934 0.633 DeepSeek-R1-Distill-Qwen-7B 0.496
The DeepSeek-R1-Distill-Qwen-7B's performance on AIME24 seems to be too large? If any one can get the similar or diffirent results, we can share with each other and have a further discussion.

After I increase the max_new_token from 16k to 28k, I get higher performance on 7B model but a slightly lower performance on 1.5B model.
Besides, I think the extractive_match on AIME24 is very sensitive since the AIME24 only contain 30 samples. The math_pass@1:32_samples may be a better indicator of model performance on this dataset.

@StarLooo
Copy link

Do you encounter the same problem mentioned in #463 when using max_new_tokens:32768 ?

@Nativu5
Copy link

Nativu5 commented May 19, 2025

Do you encounter the same problem mentioned in #463 when using max_new_tokens:32768 ?

Hi there, we are encountering the truncating problem of max_new_tokens:32768. Could you please share any ideas on setting this param? Or we can just ignore the truncating warning?

@lewtun
Copy link
Member

lewtun commented May 20, 2025

Hi @Nativu5 I think you can mostly ignore the truncation warning or alternatively set max_new_tokens to a larger value if the model's context support it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants