-
Notifications
You must be signed in to change notification settings - Fork 388
[Tiny Agents] Expose a OpenAI-compatible Web server #1473
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
input: string | ChatCompletionInputMessage[], | ||
opts: { abortSignal?: AbortSignal } = {} | ||
): AsyncGenerator<ChatCompletionStreamOutput | ChatCompletionInputMessageTool> { | ||
this.messages.push({ | ||
role: "user", | ||
content: input, | ||
}); | ||
let messages: ChatCompletionInputMessage[]; | ||
if (typeof input === "string") { | ||
/// Use internal array of messages | ||
this.messages.push({ | ||
role: "user", | ||
content: input, | ||
}); | ||
messages = this.messages; | ||
} else { | ||
/// Use the passed messages directly | ||
messages = input; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this part of the diff you are maybe not going to be a fan of, @Wauplin @hanouticelina...
Basically an OpenAI-compatible chat completion endpoint is stateless
so we need to feed the full array of messages from the downstream application here.
Let me know what you think.
/// Tool call info | ||
/// /!\ We format it as a regular chunk! | ||
const chunkToolcallInfo = { | ||
choices: [ | ||
{ | ||
index: 0, | ||
delta: { | ||
role: "assistant", | ||
content: | ||
"<tool_call_info>" + | ||
`Tool[${chunk.name}] ${chunk.tool_call_id}\n` + | ||
chunk.content + | ||
"</tool_call_info>", | ||
}, | ||
}, | ||
], | ||
created: Math.floor(Date.now() / 1000), | ||
id: chunk.tool_call_id, | ||
model: "", | ||
system_fingerprint: "", | ||
} satisfies ChatCompletionStreamOutput; | ||
|
||
res.write(`data: ${JSON.stringify(chunkToolcallInfo)}\n\n`); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the interesting part of the PR.
I format the tool call info as a "regular" chunk as if it was content generated by the model itself. 🔥
And I send it as a SSE chunk.
if (err instanceof z.ZodError) { | ||
return res.error(404, "Invalid ChatCompletionInput body \n" + JSON.stringify(err)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use a recent version of zod, you can import { z } from "zod/v4"
, and you can use z.prettifyError(err)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i tried but it requires ESM or something similar "node16" or something.. (but feel free to give it a try)
for await (const chunk of agent.run(messages)) { | ||
if ("choices" in chunk) { | ||
res.write(`data: ${JSON.stringify(chunk)}\n\n`); | ||
} else { | ||
/// Tool call info | ||
/// /!\ We format it as a regular chunk! | ||
const chunkToolcallInfo = { | ||
choices: [ | ||
{ | ||
index: 0, | ||
delta: { | ||
role: "assistant", | ||
content: | ||
"<tool_call_info>" + | ||
`Tool[${chunk.name}] ${chunk.tool_call_id}\n` + | ||
chunk.content + | ||
"</tool_call_info>", | ||
}, | ||
}, | ||
], | ||
created: Math.floor(Date.now() / 1000), | ||
id: chunk.tool_call_id, | ||
model: "", | ||
system_fingerprint: "", | ||
} satisfies ChatCompletionStreamOutput; | ||
|
||
res.write(`data: ${JSON.stringify(chunkToolcallInfo)}\n\n`); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for await (const chunk of agent.run(messages)) { | |
if ("choices" in chunk) { | |
res.write(`data: ${JSON.stringify(chunk)}\n\n`); | |
} else { | |
/// Tool call info | |
/// /!\ We format it as a regular chunk! | |
const chunkToolcallInfo = { | |
choices: [ | |
{ | |
index: 0, | |
delta: { | |
role: "assistant", | |
content: | |
"<tool_call_info>" + | |
`Tool[${chunk.name}] ${chunk.tool_call_id}\n` + | |
chunk.content + | |
"</tool_call_info>", | |
}, | |
}, | |
], | |
created: Math.floor(Date.now() / 1000), | |
id: chunk.tool_call_id, | |
model: "", | |
system_fingerprint: "", | |
} satisfies ChatCompletionStreamOutput; | |
res.write(`data: ${JSON.stringify(chunkToolcallInfo)}\n\n`); | |
} | |
// Track tool call indices for proper formatting | |
let toolCallIndex = 0; | |
for await (const chunk of agent.run(messages)) { | |
if ("choices" in chunk) { | |
res.write(`data: ${JSON.stringify(chunk)}\n\n`); | |
} else { | |
// Tool call - format in OpenAI-compatible structure | |
const chunkToolcallInfo = { | |
choices: [ | |
{ | |
index: 0, | |
delta: { | |
role: "assistant", | |
tool_calls: [ | |
{ | |
index: toolCallIndex, | |
id: chunk.tool_call_id, | |
type: "function", | |
function: { | |
name: chunk.name, | |
arguments: chunk.content, | |
}, | |
}, | |
], | |
}, | |
}, | |
], | |
created: Math.floor(Date.now() / 1000), | |
id: crypto.randomUUID(), | |
model: agent.modelName || "agent", | |
system_fingerprint: "", | |
} satisfies ChatCompletionStreamOutput; | |
res.write(`data: ${JSON.stringify(chunkToolcallInfo)}\n\n`); | |
// Increment tool call index for the next tool call | |
toolCallIndex++; | |
} |
shouldn't we use delta.tool_calls
(example here) rather than custom <tool_call_info>...</tool_call_info> tags ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm it's not the same:
- delta.tool_calls are from the LLM asking for some tool calling (the LLM is asking "provide me the output of this function call so I can incorporate it into my thinking"), providing the inputs to the tool call.
- whereas here in
<tool_call_info>...</tool_call_info>
I send the tool outputs so it can be displayed in the UI.
Do you see what I mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The distinction between the LLM’s tool call intent versus returning the actual tool output for display definitely makes sense.
One thought: it could be cleaner (and more future-proof) to provide both the tool call delta and the tool call result as structured fields within the delta stream, rather than relying on custom tags in the output. That way, you avoid potential collisions with future model formats, and maximize compatibility with clients already following the OpenAI API spec. Both approaches have valid use cases, but leaning on structured responses might help with long-term maintainability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it could be cleaner (and more future-proof) to provide both the tool call delta and the tool call result as structured fields within the delta stream, rather than relying on custom tags in the output
indeed. strong agree with the comment.
It should return 2 messages:
- message 1: assistant msg with
tool_calls
- message 2: user message providing
<tool_response>...</tool_response>
{
role: 'assistant',
content: "<think>\nThe user is asking about the weather in New York. I should use the weather tool to get this information.\n</think>\nI'll check the current weather in New York for you.",
tool_calls: [
{
function: {
name: 'get_weather',
arguments: {
location: 'New York',
unit: 'celsius',
},
},
},
],
},
{
role: 'user',
content: '<tool_response>\n{"temperature": 22, "condition": "Sunny", "humidity": 45, "wind_speed": 10}\n</tool_response>',
},
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm i don't think it should be a user
message @mishig25 – it's still an assistant
message as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tjbck sounds interesting, but it would be a new property that's outside of the OpenAI spec, no?
Or do you have an example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm with @mishig25 on this one - it's the convention Anthropic use: https://docs.anthropic.com/en/docs/agents-and-tools/tool-use/overview#client-tools.
openai uses role: tool
for tool responses.
Co-authored-by: Mishig <[email protected]>
Co-authored-by: Mishig <[email protected]>
If you think about it, an Agent can easily be wrapped into an OpenAI-compatible Chat Completion endpoint as if it was a "plain" model. 💡
One would simply need to display the tool call info in a specific UI, similar to what we do for reasoning tokens. Hence, I chose to wrap the tool call infos into a set of
<tool_call_info>...</tool_call_info>
tags.How to run an example
Then run an example to see how it works, calling our standard
chatCompletionStream
method from@huggingface/inference