-
Notifications
You must be signed in to change notification settings - Fork 1.1k
feat(responses): add output_text delta events to responses #2265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few questions. Also, if you haven't run the verification tests with OpenAI's impl, would be good to do so just to verify that the tests are correctly checking for the official behavior.
# Process response choices (tool execution and message creation) | ||
output_messages = await self._process_response_choices( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we just name this function like _execute_tools or something more descriptive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ehhuang I think we need to simplify further because this function is muddled in how it thinks of itself :) the next set of PRs which do multi-turn execution will refactor it to be better, thanks for the feedback.
# Create a placeholder message item for delta events | ||
message_item_id = f"msg_{uuid.uuid4()}" | ||
|
||
async for chunk in inference_result: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
async def stream_and_store_openai_completion( |
Looks like we're doing these delta accumulations more (I recall seeing another instance somewhere, but can't recall the exact location), maybe some of the above can be reused. Could be a follow-up.
llama_stack/providers/inline/agents/meta_reference/openai_responses.py
Outdated
Show resolved
Hide resolved
Tested against OpenAI client too. See updated test plan. |
This adds initial streaming support to the Responses API.
This PR makes sure that the first inference call made to chat completions streams out.
There's more to be done:
Test Plan
Added a test. Executed as:
Then, started a llama stack fireworks distro and tested against it like this: