feat(responses): add output_text delta events to responses #2265

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

ashwinb merged 5 commits into meta-llama:main from ashwinb:resp_stream

May 27, 2025

Contributor

ashwinb commented May 26, 2025 •

edited

Loading

This adds initial streaming support to the Responses API.

This PR makes sure that the first inference call made to chat completions streams out.

There's more to be done:

tool call output tokens need to stream out when possible
we need to loop through multiple rounds of inference and they all need to stream out.

Test Plan

Added a test. Executed as:

FIREWORKS_API_KEY=... \
  pytest -s -v 'tests/verifications/openai_api/test_responses.py' \
  --provider=stack:fireworks --model meta-llama/Llama-4-Scout-17B-16E-Instruct

Then, started a llama stack fireworks distro and tested against it like this:

OPENAI_API_KEY=blah \
   pytest -s -v 'tests/verifications/openai_api/test_responses.py' \
   --base-url http://localhost:8321/v1/openai/v1 \
  --model meta-llama/Llama-4-Scout-17B-16E-Instruct

ashwinb requested review from yanxi0830, hardikjshah, raghotham, ehhuang, terrytangyuan, leseb and bbrowning as code owners

May 26, 2025 01:20

facebook-github-bot added the CLA Signed label

hardikjshah reviewed

View reviewed changes

llama_stack/providers/inline/agents/meta_reference/openai_responses.py Show resolved Hide resolved

ehhuang reviewed

View reviewed changes

Contributor

ehhuang left a comment

Few questions. Also, if you haven't run the verification tests with OpenAI's impl, would be good to do so just to verify that the tests are correctly checking for the official behavior.

llama_stack/providers/inline/agents/meta_reference/openai_responses.py Show resolved Hide resolved

llama_stack/providers/inline/agents/meta_reference/openai_responses.py Outdated

Comment on lines 406 to 407

		# Process response choices (tool execution and message creation)
		output_messages = await self._process_response_choices(

Contributor

ehhuang May 27, 2025

should we just name this function like _execute_tools or something more descriptive?

Contributor Author

ashwinb May 27, 2025

@ehhuang I think we need to simplify further because this function is muddled in how it thinks of itself :) the next set of PRs which do multi-turn execution will refactor it to be better, thanks for the feedback.

llama_stack/providers/inline/agents/meta_reference/openai_responses.py

+                      # Create a placeholder message item for delta events
+                      message_item_id = f"msg_{uuid.uuid4()}"
+                      async for chunk in inference_result:

Contributor

ehhuang May 27, 2025

llama-stack/llama_stack/providers/utils/inference/stream_utils.py

Line 24 in e7e9ec0

async def stream_and_store_openai_completion(

Looks like we're doing these delta accumulations more (I recall seeing another instance somewhere, but can't recall the exact location), maybe some of the above can be reused. Could be a follow-up.

llama_stack/providers/inline/agents/meta_reference/openai_responses.py Outdated Show resolved Hide resolved

ehhuang approved these changes

View reviewed changes

hardikjshah approved these changes

View reviewed changes

ashwinb added 4 commits

May 27, 2025 11:13


          feat(responses): add output_text delta events to responses

cad1c9b

fix

12dfcd1

fix

70d3e4b


          fixes, update test to be more robust

cad6464

ashwinb force-pushed the resp_stream branch from 85035e5 to cad6464 Compare

May 27, 2025 19:46


          fixes

bf8d76f

Contributor Author

ashwinb commented May 27, 2025

Tested against OpenAI client too. See updated test plan.

ashwinb merged commit 5cdb297 into meta-llama:main

27 checks passed

ashwinb deleted the resp_stream branch

May 27, 2025 20:07

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

ehhuang ehhuang approved these changes

hardikjshah hardikjshah approved these changes

yanxi0830 Awaiting requested review from yanxi0830 yanxi0830 is a code owner

raghotham Awaiting requested review from raghotham raghotham is a code owner

terrytangyuan Awaiting requested review from terrytangyuan terrytangyuan is a code owner

leseb Awaiting requested review from leseb leseb is a code owner

bbrowning Awaiting requested review from bbrowning bbrowning is a code owner

Labels