[Tiny Agents] Expose a OpenAI-compatible Web server #1473

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

julien-c wants to merge 7 commits into main from tiny-agents-web-server

+169 −18

Member

julien-c commented May 21, 2025 •

edited by mishig25

Loading

If you think about it, an Agent can easily be wrapped into an OpenAI-compatible Chat Completion endpoint as if it was a "plain" model. 💡

One would simply need to display the tool call info in a specific UI, similar to what we do for reasoning tokens. Hence, I chose to wrap the tool call infos into a set of <tool_call_info>...</tool_call_info> tags.

How to run an example

# Start a web server on port 9999
# cd packages/tiny-agents 
pnpm cli:watch serve ./src/agents/julien-c/local-coder/

Then run an example to see how it works, calling our standard chatCompletionStream method from @huggingface/inference

tsx src/example.ts

julien-c added 3 commits

May 21, 2025 17:27


          [Tiny Agents] Expose a OpenAPI-compatible Web server

acb4631


          tool_call => tool_call_info to not conflict with internal tool calls …

7460a2b

…of model


          Example of use

e8e5fa6

julien-c requested review from evalstate, Wauplin, mishig25 and hanouticelina

May 21, 2025 15:54

julien-c commented

View reviewed changes

packages/mcp-client/src/Agent.ts

Comment on lines +91 to +105

+              		input: string | ChatCompletionInputMessage[],
               		opts: { abortSignal?: AbortSignal } = {}
               	): AsyncGenerator<ChatCompletionStreamOutput | ChatCompletionInputMessageTool> {
-              		this.messages.push({
-              			role: "user",
-              			content: input,
-              		});
+              		let messages: ChatCompletionInputMessage[];
+              		if (typeof input === "string") {
+              			/// Use internal array of messages
+              			this.messages.push({
+              				role: "user",
+              				content: input,
+              			});
+              			messages = this.messages;
+              		} else {
+              			/// Use the passed messages directly
+              			messages = input;
+              		}

Member Author

julien-c May 21, 2025

this part of the diff you are maybe not going to be a fan of, @Wauplin @hanouticelina...

Basically an OpenAI-compatible chat completion endpoint is stateless so we need to feed the full array of messages from the downstream application here.

Let me know what you think.

julien-c commented

View reviewed changes

packages/tiny-agents/src/lib/webServer.ts

Comment on lines +91 to +113

+              					/// Tool call info
+              					/// /!\ We format it as a regular chunk!
+              					const chunkToolcallInfo = {
+              						choices: [
+              							{
+              								index: 0,
+              								delta: {
+              									role: "assistant",
+              									content:
+              										"<tool_call_info>" +
+              										`Tool[${chunk.name}] ${chunk.tool_call_id}\n` +
+              										chunk.content +
+              										"</tool_call_info>",
+              								},
+              							},
+              						],
+              						created: Math.floor(Date.now() / 1000),
+              						id: chunk.tool_call_id,
+              						model: "",
+              						system_fingerprint: "",
+              					} satisfies ChatCompletionStreamOutput;
+              					res.write(`data: ${JSON.stringify(chunkToolcallInfo)}\n\n`);

Member Author

julien-c May 21, 2025

This is the interesting part of the PR.

I format the tool call info as a "regular" chunk as if it was content generated by the model itself. 🔥

And I send it as a SSE chunk.

julien-c changed the title ~~[Tiny Agents] Expose a OpenAPI-compatible Web server~~ [Tiny Agents] Expose a OpenAI-compatible Web server

julien-c requested a review from coyotte508

May 21, 2025 16:05

coyotte508 reviewed

View reviewed changes

packages/tiny-agents/src/lib/webServer.ts Outdated

Comment on lines 64 to 66

+              				if (err instanceof z.ZodError) {
+              					return res.error(404, "Invalid ChatCompletionInput body \n" + JSON.stringify(err));
+              				}

Member

coyotte508 May 21, 2025

Use a recent version of zod, you can import { z } from "zod/v4", and you can use z.prettifyError(err)

Member Author

julien-c May 21, 2025

i tried but it requires ESM or something similar "node16" or something.. (but feel free to give it a try)

packages/tiny-agents/src/lib/webServer.ts Outdated Show resolved Hide resolved

packages/tiny-agents/src/cli.ts Outdated Show resolved Hide resolved


          review from @coyotte508

ae60159

mishig25 reviewed

View reviewed changes

packages/tiny-agents/src/cli.ts Outdated Show resolved Hide resolved

mishig25 reviewed

View reviewed changes

packages/tiny-agents/src/lib/webServer.ts Outdated Show resolved Hide resolved

mishig25 reviewed

View reviewed changes

packages/tiny-agents/src/lib/webServer.ts

Comment on lines +80 to +107

    
              			for await (const chunk of agent.run(messages)) {

              				if ("choices" in chunk) {

              					res.write(`data: ${JSON.stringify(chunk)}\n\n`);

              				} else {

              					/// Tool call info

              					/// /!\ We format it as a regular chunk!

              					const chunkToolcallInfo = {

              						choices: [

              							{

              								index: 0,

              								delta: {

              									role: "assistant",

              									content:

              										"<tool_call_info>" +

              										`Tool[${chunk.name}] ${chunk.tool_call_id}\n` +

              										chunk.content +

              										"</tool_call_info>",

              								},

              							},

              						],

              						created: Math.floor(Date.now() / 1000),

              						id: chunk.tool_call_id,

              						model: "",

              						system_fingerprint: "",

              					} satisfies ChatCompletionStreamOutput;

              					res.write(`data: ${JSON.stringify(chunkToolcallInfo)}\n\n`);

              				}

Collaborator

mishig25 May 21, 2025

Suggested change

      
            			for await (const chunk of agent.run(messages)) {
          
            				if ("choices" in chunk) {
          
            					res.write(`data: ${JSON.stringify(chunk)}\n\n`);
          
            				} else {
          
            					/// Tool call info
          
            					/// /!\ We format it as a regular chunk!
          
            					const chunkToolcallInfo = {
          
            						choices: [
          
            							{
          
            								index: 0,
          
            								delta: {
          
            									role: "assistant",
          
            									content:
          
            										"<tool_call_info>" +
          
            										`Tool[${chunk.name}] ${chunk.tool_call_id}\n` +
          
            										chunk.content +
          
            										"</tool_call_info>",
          
            								},
          
            							},
          
            						],
          
            						created: Math.floor(Date.now() / 1000),
          
            						id: chunk.tool_call_id,
          
            						model: "",
          
            						system_fingerprint: "",
          
            					} satisfies ChatCompletionStreamOutput;
          
            					res.write(`data: ${JSON.stringify(chunkToolcallInfo)}\n\n`);
          
            				}
          
            			// Track tool call indices for proper formatting
          
            			let toolCallIndex = 0;
          
            			for await (const chunk of agent.run(messages)) {
          
            				if ("choices" in chunk) {
          
            					res.write(`data: ${JSON.stringify(chunk)}\n\n`);
          
            				} else {
          
            					// Tool call - format in OpenAI-compatible structure
          
            					const chunkToolcallInfo = {
          
            						choices: [
          
            							{
          
            								index: 0,
          
            								delta: {
          
            									role: "assistant",
          
            									tool_calls: [
          
            										{
          
            											index: toolCallIndex,
          
            											id: chunk.tool_call_id,
          
            											type: "function",
          
            											function: {
          
            												name: chunk.name,
          
            												arguments: chunk.content,
          
            											},
          
            										},
          
            									],
          
            								},
          
            							},
          
            						],
          
            						created: Math.floor(Date.now() / 1000),
          
            						id: crypto.randomUUID(),
          
            						model: agent.modelName || "agent",
          
            						system_fingerprint: "",
          
            					} satisfies ChatCompletionStreamOutput;
          
            					res.write(`data: ${JSON.stringify(chunkToolcallInfo)}\n\n`);
          
            					// Increment tool call index for the next tool call
          
            					toolCallIndex++;
          
            				}

shouldn't we use delta.tool_calls (example here) rather than custom <tool_call_info>...</tool_call_info> tags ?

Member Author

julien-c May 22, 2025

hmm it's not the same:

delta.tool_calls are from the LLM asking for some tool calling (the LLM is asking "provide me the output of this function call so I can incorporate it into my thinking"), providing the inputs to the tool call.
whereas here in <tool_call_info>...</tool_call_info> I send the tool outputs so it can be displayed in the UI.

Do you see what I mean?

tjbck May 22, 2025

The distinction between the LLM’s tool call intent versus returning the actual tool output for display definitely makes sense.

One thought: it could be cleaner (and more future-proof) to provide both the tool call delta and the tool call result as structured fields within the delta stream, rather than relying on custom tags in the output. That way, you avoid potential collisions with future model formats, and maximize compatibility with clients already following the OpenAI API spec. Both approaches have valid use cases, but leaning on structured responses might help with long-term maintainability.

Collaborator

mishig25 May 22, 2025 •

edited

Loading

it could be cleaner (and more future-proof) to provide both the tool call delta and the tool call result as structured fields within the delta stream, rather than relying on custom tags in the output

indeed. strong agree with the comment.

It should return 2 messages:

message 1: assistant msg with tool_calls
message 2: user message providing <tool_response>...</tool_response>

{  
  role: 'assistant',
  content: "<think>\nThe user is asking about the weather in New York. I should use the weather tool to get this information.\n</think>\nI'll check the current weather in New York for you.",
  tool_calls: [
    {
      function: {
        name: 'get_weather',
        arguments: {
          location: 'New York',
          unit: 'celsius',
        },
      },
    },
  ],
},
{
  role: 'user',
  content: '<tool_response>\n{"temperature": 22, "condition": "Sunny", "humidity": 45, "wind_speed": 10}\n</tool_response>',
},

Member Author

julien-c May 22, 2025

Hmm i don't think it should be a user message @mishig25 – it's still an assistant message as well

Member Author

julien-c May 22, 2025

@tjbck sounds interesting, but it would be a new property that's outside of the OpenAI spec, no?
Or do you have an example?

Collaborator

evalstate May 22, 2025 •

edited

Loading

i'm with @mishig25 on this one - it's the convention Anthropic use: https://docs.anthropic.com/en/docs/agents-and-tools/tool-use/overview#client-tools.

openai uses role: tool for tool responses.

mishig25 reviewed

View reviewed changes

packages/tiny-agents/src/example.ts Show resolved Hide resolved

nsarrazin self-requested a review

May 21, 2025 22:35

julien-c and others added 3 commits

May 22, 2025 12:25


          Update packages/tiny-agents/src/lib/webServer.ts

6ce1162

Co-authored-by: Mishig <[email protected]>


          Update packages/tiny-agents/src/cli.ts

a0a865f

Co-authored-by: Mishig <[email protected]>


          404 => 400

c5fbd54

@mishig25

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

coyotte508 coyotte508 left review comments

mishig25 mishig25 left review comments

tjbck tjbck left review comments

evalstate Awaiting requested review from evalstate

Wauplin Awaiting requested review from Wauplin

hanouticelina Awaiting requested review from hanouticelina

nsarrazin Awaiting requested review from nsarrazin

Labels

None yet