Skip to content

Prefilling assistant message in openai compatible API #13174

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Apr 29, 2025

Conversation

matteoserva
Copy link
Contributor

@matteoserva matteoserva commented Apr 29, 2025

This adds support for prefilling assistant response (or its thought process) using the OpenAI compatible API.

The feature is used for example by Claude.

It can be tested using open-webui or with the following curl command:

curl http://localhost:8080/apply-template \
-H "Content-Type: application/json" \
-H "Authorization: Bearer no-key" \
-d '{
"model": "gpt-3.5-turbo",
"messages": [
 {
    "role": "system",
    "content": "SYSTEM"
 },
 {
    "role": "user",
    "content": "USERMESSAGE"
 },
 {
    "role": "assistant",
    "content": "ASSISTANT"
 }
]
}'

Example advanced scenario: time limit for the thinking process

  • launch a reasoning model and stop its thought early
  • append </think> to its partial response
  • prefill the response and let it continue generating tokens

@ngxson ngxson merged commit e2e1ddb into ggml-org:master Apr 29, 2025
47 of 48 checks passed
@isaac-mcfadyen
Copy link
Contributor

isaac-mcfadyen commented Apr 30, 2025

Just a heads-up that this is potentially a very breaking change, especially because this is an OpenAI compatible API but this is not OpenAI's behavior.

The main situation I can think of is if someone wants to generate a new assistant message after the last one - i.e for ChatML they want the <|im_end|><|im_start|>assistant added between the last message and the new one, rather than the last message to just be continued.

I'd suggest we add this to #9291 at a minimum.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants