Support jinja extra template kwargs (Qwen3 enable_thinking feature), from command line and from client #13196

matteoserva · 2025-04-29T18:58:24Z

This PR implements handling additional jinja parameters.
Used for example to set enable_thinking in Qwen3 models.

The official template is still partially compatible. I modified it to use only supported features.
It's here: https://pastebin.com/16ZpCLHk
And should be loaded with llama-server --jinja --chat-template-file {template_file}

It fixes #13160 and #13189

Test it with:

enable_thinking=false. Expected: {"prompt":"\n<|im_start|>user\nGive me a short introduction to large language models.<|im_end|>\n<|im_start|>assistant\n<think>\n\n</think>\n\n"}

curl http://localhost:8080/apply-template -H "Content-Type: application/json" -d '{
  "model": "Qwen/Qwen3-8B",
  "messages": [
    {"role": "user", "content": "Give me a short introduction to large language models."}
  ],
  "temperature": 0.7,
  "top_p": 0.8,
  "top_k": 20,
  "max_tokens": 8192,
  "presence_penalty": 1.5,
  "chat_template_kwargs": {"enable_thinking": false}
}'

enable_thinking=true

curl http://localhost:8080/apply-template -H "Content-Type: application/json" -d '{
  "model": "Qwen/Qwen3-8B",
  "messages": [
    {"role": "user", "content": "Give me a short introduction to large language models."}
  ],
  "temperature": 0.7,
  "top_p": 0.8,
  "top_k": 20,
  "max_tokens": 8192,
  "presence_penalty": 1.5,
  "chat_template_kwargs": {"enable_thinking": true}
}'

enable_thinking undefined

curl http://localhost:8080/apply-template -H "Content-Type: application/json" -d '{
  "model": "Qwen/Qwen3-8B",
  "messages": [
    {"role": "user", "content": "Give me a short introduction to large language models."}
  ],
  "temperature": 0.7,
  "top_p": 0.8,
  "top_k": 20,
  "max_tokens": 8192,
  "presence_penalty": 1.5
}'

rhjdvsgsgks · 2025-04-29T20:14:30Z

can you add chat_template_kwargs to cli argument as well?

matteoserva · 2025-04-30T07:36:22Z

can you add chat_template_kwargs to cli argument as well?

I added it. I tested it using updated command (You might want to check the escaping of the double quotes):
--chat_template_kwargs "{\"enable_thinking\":false}" --jinja --chat-template-file qwen/qwen3_template.txt

neolee · 2025-05-01T03:01:10Z

Very useful for Qwen3 series. +1 for this feature!

matteoserva requested a review from ngxson as a code owner April 29, 2025 18:58

matteoserva marked this pull request as draft April 29, 2025 18:58

github-actions bot added examples server labels Apr 29, 2025

matteoserva force-pushed the enable_thinking branch from 76549e1 to 379c7a8 Compare April 29, 2025 19:01

zpitroda mentioned this pull request Apr 30, 2025

Feature/qwen3 mindverse/Second-Me#311

Draft

matteoserva added 4 commits April 30, 2025 17:56

initial commit for handling extra template kwargs

64531c6

enable_thinking and assistant prefill cannot be enabled at the same time

ed82633

fixed whitespace

c4e7718

can set chat_template_kwargs in command line

92ec57e

matteoserva force-pushed the enable_thinking branch from 7c04deb to d1861c4 Compare April 30, 2025 15:57

added doc

01b58b5

matteoserva force-pushed the enable_thinking branch from d1861c4 to 01b58b5 Compare April 30, 2025 15:58

matteoserva changed the title ~~[RFC] handling jinja extra template kwargs (Qwen3 enable_thinking feature)~~ Support jinja extra template kwargs (Qwen3 enable_thinking feature), from command line and from client Apr 30, 2025

matteoserva marked this pull request as ready for review April 30, 2025 15:59

fixed formatting

2d1d595

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support jinja extra template kwargs (Qwen3 enable_thinking feature), from command line and from client #13196

Support jinja extra template kwargs (Qwen3 enable_thinking feature), from command line and from client #13196

matteoserva commented Apr 29, 2025

rhjdvsgsgks commented Apr 29, 2025

matteoserva commented Apr 30, 2025 •

edited

Loading

neolee commented May 1, 2025

Support jinja extra template kwargs (Qwen3 enable_thinking feature), from command line and from client #13196

Are you sure you want to change the base?

Support jinja extra template kwargs (Qwen3 enable_thinking feature), from command line and from client #13196

Conversation

matteoserva commented Apr 29, 2025

rhjdvsgsgks commented Apr 29, 2025

matteoserva commented Apr 30, 2025 • edited Loading

neolee commented May 1, 2025

matteoserva commented Apr 30, 2025 •

edited

Loading