Skip to content

Support jinja extra template kwargs (Qwen3 enable_thinking feature), from command line and from client #13196

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

matteoserva
Copy link
Contributor

This PR implements handling additional jinja parameters.
Used for example to set enable_thinking in Qwen3 models.

The official template is still partially compatible. I modified it to use only supported features.
It's here: https://pastebin.com/16ZpCLHk
And should be loaded with llama-server --jinja --chat-template-file {template_file}

It fixes #13160 and #13189

Test it with:

  • enable_thinking=false. Expected: {"prompt":"\n<|im_start|>user\nGive me a short introduction to large language models.<|im_end|>\n<|im_start|>assistant\n<think>\n\n</think>\n\n"}
curl http://localhost:8080/apply-template -H "Content-Type: application/json" -d '{
  "model": "Qwen/Qwen3-8B",
  "messages": [
    {"role": "user", "content": "Give me a short introduction to large language models."}
  ],
  "temperature": 0.7,
  "top_p": 0.8,
  "top_k": 20,
  "max_tokens": 8192,
  "presence_penalty": 1.5,
  "chat_template_kwargs": {"enable_thinking": false}
}'
  • enable_thinking=true
curl http://localhost:8080/apply-template -H "Content-Type: application/json" -d '{
  "model": "Qwen/Qwen3-8B",
  "messages": [
    {"role": "user", "content": "Give me a short introduction to large language models."}
  ],
  "temperature": 0.7,
  "top_p": 0.8,
  "top_k": 20,
  "max_tokens": 8192,
  "presence_penalty": 1.5,
  "chat_template_kwargs": {"enable_thinking": true}
}'
  • enable_thinking undefined
curl http://localhost:8080/apply-template -H "Content-Type: application/json" -d '{
  "model": "Qwen/Qwen3-8B",
  "messages": [
    {"role": "user", "content": "Give me a short introduction to large language models."}
  ],
  "temperature": 0.7,
  "top_p": 0.8,
  "top_k": 20,
  "max_tokens": 8192,
  "presence_penalty": 1.5
}'

@matteoserva matteoserva requested a review from ngxson as a code owner April 29, 2025 18:58
@matteoserva matteoserva marked this pull request as draft April 29, 2025 18:58
@rhjdvsgsgks
Copy link
Contributor

can you add chat_template_kwargs to cli argument as well?

@matteoserva
Copy link
Contributor Author

matteoserva commented Apr 30, 2025

can you add chat_template_kwargs to cli argument as well?

I added it. I tested it using updated command (You might want to check the escaping of the double quotes):
--chat_template_kwargs "{\"enable_thinking\":false}" --jinja --chat-template-file qwen/qwen3_template.txt

@matteoserva matteoserva changed the title [RFC] handling jinja extra template kwargs (Qwen3 enable_thinking feature) Support jinja extra template kwargs (Qwen3 enable_thinking feature), from command line and from client Apr 30, 2025
@matteoserva matteoserva marked this pull request as ready for review April 30, 2025 15:59
@neolee
Copy link

neolee commented May 1, 2025

Very useful for Qwen3 series. +1 for this feature!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Misc. bug: Qwen 3.0 "enable_thinking" parameter not working
3 participants